Stock Portfolio Analytics¶

Toronto, September 16 2024
Autor : Atsu Vovor

Master of Management in Artificial Intelligence,
Consultant Data Analytics Specialist | Machine Learning | Data science | Quantitative Analysis |French & English Bilingual

Abstract¶

This project presents the development of an advanced stock portfolio analytics tool designed to assist portfolio managers in optimizing investment strategies. By leveraging statistical analysis, mathematical and machine learning techniques, the tool provides insights into stock asset pricing, risk assessment, asset allocation, and performance forecasting. The project outlines the methodology used, including data collection and preprocessing, explanatory datanalysis, model selection and evaluation metrics, stress testing under economic key performance indicators scenarios. Results demonstrate the tool's effectiveness in enhancing decision-making processes, potentially leading to improved portfolio performance. The findings highlight the importance of integrating modern analytics into traditional portfolio management to navigate the complexities of today's financial markets.

Introduction¶

The growing complexity of financial instruments and risk factors places significant pressure on portfolio managers, who must navigate and analyze a vast and intricate flow of data each day. Utilizing a robust dataset comprising historical stock prices, economic indicators, and financial metrics, our goal is to develop an advanced stock portfolio analysis tool that leverages advanced statistical methods, portfolio optimization and machine learning techniques to assist portfolio managers in making informed decisions. The tool provides insights into the asset pricing, risk assessment, asset allocation, and performance forecasting.

To achieve this goal, we begin by dynamically collecting real time data of all the S&P/TSX composite constituents adjust closed prices and canadian economic factors. The methodology used involves data preprocessing to ensure accuracy and relevance, followed by exploratory data analysis (EDA) to uncover key trends and correlations. Principal Component Analysis (PCA) is applied to reduce the dimensionality of the dataset, enabling the identification of the most influential factors affecting portfolio performance. We then use correlation analysis and hierarchical clustering to categorize stocks into distinct groups, facilitating diversification and risk management.

Moreover, the project explores advenced assets pricing technics sach as Stochastic Differencial Equation and Monte Carlo Simulation combined with modern portfolio theory (MPT) to simulate the portfolio price, profit & lost, risk and construct efficient portfolios, and stress testing techniques to evaluate portfolio robustness under various economic scenarios. The results demonstrate significant improvements in risk-adjusted returns, providing actionable insights for portfolio managers and investors.

In conclusion, this project underscores the importance of integrating advanced analytics into investment decision-making processes. The findings offer a valuable framework for optimizing stock portfolios, enhancing performance, and managing risk in an increasingly complex financial environment.

Description¶

This Project presents an in-depth analysis of stock portfolio management through the application of advanced data analytics techniques. The project aims to address the challenges faced by investors in optimizing their portfolios by incorporating a data-driven approach to decision-making. By analyzing historical stock prices, financial indicators, and macroeconomic variables, the project seeks to develop strategies that maximize returns while minimizing risk.

Scope of the Project

The scope of this project includes the following key areas:

1. Data Collection and Preprocessing:

  • The project begins with the collection of a comprehensive dataset that includes historical stock prices, financial ratios, and relevant economic indicators.
  • Data preprocessing steps are undertaken to clean and prepare the data, ensuring accuracy, consistency, and relevance. This includes handling missing data, normalizing variables, and filtering out noise.

2. Exploratory Data Analysis (EDA):

  • EDA is conducted to uncover underlying trends, correlations, and patterns within the data. This step provides insights into the behavior of individual stocks and the market as a whole, laying the foundation for further analysis.
  • Visualization techniques are employed to illustrate key findings and to identify potential opportunities for portfolio optimization.

3. Dimensionality Reduction and Portfolio Construction using Correlation Analysis, Clustering and Principal Component Analysis (PCA)

  • Correlation Analysis, Clustering and Portfolio Construction Hierarchical clustering techniques are applied to group stocks into clusters based on their similarities in performance, risk profile, and other attributes. This clustering facilitates the selection of a diversified set of assets for portfolio construction, ensuring that the portfolio is balanced and less susceptible to market shocks.

  • Principal Component Analysis (PCA) To manage the complexity of the dataset and to focus on the most impactful variables, PCA is utilized to reduce the number of factors considered in the analysis.It helps in identifying the principal components that explain the majority of the variance in the data, enabling the selection of the most relevant indicators for portfolio construction.

  • Stacking PCA,Correlation Analysis and Clustering for Diversified Portfolio Construction stacking Correlation Analysis, Clustering and Principal Component Analysis (PCA) helps to construct a well diversified portfolio

4. Asset Pricing, Profit & Lost simulation and Risk calculation*

  • Lognormal of asset returns, Covarariance Matrix Cholesky Decomposition applied to Monte Carlo Simulation for asset pricing and Profit & Lost simulation.
  • Value at Risk(VaR) and Conditional Value at Risk(CVaR) calculation

5. Portfolio Optimization:

  • Modern Portfolio Theory (MPT) is implemented to construct efficient portfolios that optimize the trade-off between risk and return.
  • The optimization process involves determining the boundary random portfolios assets and weights that maximize the portfolio's expected return for a given level of risk or minimize risk for a given level of expected return or a given risk level.
  • Using Monte Carlo simulation to generate Efficient Frontier
  • Machine Learning technics are used to improve the optimization process by modelling the boundary random portfolios assets that maximize the portfolio's expected return for a given level of risk or minimize risk for a given level of expected return or a given risk level.
  • Investment strategies are bult for optimal portfolios(minimal risk portfolio, maximal return portfolio, sharpe ratio (tangent portfolio)

6. Investment Risk Profiles Simulation using K-Means Clustering applied to random portfolio

  • The simulated portfolio risk is combigned with the simulated the portfolio expected return and the predicted expected return to set the randomn efficient frontier data. The randomn efficient frontier data is then used as input for the K-means cluster models to simulate the instment risk profile and investment strategy.

7. Stress Testing and Scenario Analysis:

  • Stress testing is conducted to evaluate the portfolio's performance under different economic scenarios, including adverse market conditions.
  • This analysis provides insights into the portfolio’s resilience and helps in identifying potential vulnerabilities.

Tools and Technologies

The project leverages various tools and technologies, including:

  • Python: For data analysis, statistical modeling, and machine learning.
  • Pandas and NumPy: For data manipulation and numerical computations.
  • Matplotlib and Seaborn: For data visualization.
  • Scikit-learn: For machine learning, PCA, and clustering.
  • Optimization Libraries: For portfolio optimization using MPT.
  • Financial Databases: To source historical data, including stock prices and economic indicators.

Key Outcomes The project yields several key outcomes:

  • Identification of the most influential economic indicators and stock characteristics for portfolio management.
  • Creation of optimized portfolios that demonstrate improved risk-adjusted returns.
  • Insights into portfolio performance under various market conditions, aiding in risk management and strategic planning.
In [1]:
#pip install stats-can
#conda pip install -c districtdatalabs yellowbrick on Anaconda Prompt
#conda install conda=24.5.0
#conda install conda-forge::stats_can 

Import Libraries¶

In [2]:
import yfinance as yf
import pandas as pd
from datetime import date, timedelta
import seaborn as sns 
import numpy as np
import matplotlib.pyplot as plt 
from sklearn.preprocessing import LabelEncoder
from scipy.stats import norm, lognorm, exponnorm, logistic, erlang,gennorm 
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
#from sklearn.metrics import root_mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import mean_absolute_percentage_error
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
#from yellowbrick.cluster import KElbowVisualizer
from scipy.optimize import curve_fit
import random
from statistics import NormalDist
from scipy import stats
from fitter import Fitter, get_common_distributions, get_distributions 
import matplotlib.transforms as transforms
from matplotlib.table import table
from numpy import arange
from pandas import read_csv
from scipy.optimize import curve_fit
import warnings
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.model_selection import train_test_split, cross_val_score
from tabulate import tabulate
from pandas.plotting import lag_plot
import re
#from stats_can import StatsCan as sc
from stats_can import StatsCan
sc = StatsCan()
from scipy.cluster.hierarchy import fcluster

#import pandas_datareader.data as web

1. Index Contents Data Collection and Preprocessing¶

In this section, we will read all the S&P/TSX composite constituents table from wikipedia(https://en.wikipedia.org/wiki/S%26P/TSX_Composite_Index). Then we will get the tickers adjusted close prices from Yahoo Finance using yfinance library. We will clean the data by removing all the empty rows and columns. With more than 200 remaining tickers, we will calculate the assets log return and we will remove all the assets with negative expected return. We will couple Correlation Analysis with Principle Component Analysis to reduce the volume of assets and keep only most important assets. The Correlation Analysis will be used to identify and remove redundant assets. The end result will be a well diversify portfolio.

In [3]:
#--------------------------------------------- 1. Index Contents Data Collection and Preprocessing ------------------------------------------------
#read the index content from wikipedia and return the index content data frame
def read_index_content(content_html,web_tab_number):
    S_and_P_TSX_Composite = pd.read_html(content_html)[web_tab_number] 
    index_content_df = S_and_P_TSX_Composite[['Ticker','Company','Sector [10]','Industry [10]']]
    index_content_df = index_content_df.rename(columns={"Sector [10]": "Sector", "Industry [10]": "Industry"})
    return index_content_df
    #return S_and_P_TSX_Composite[['Ticker','Company','Sector [10]','Industry [10]']].head()
    
    
#extract the index tickers 
def generate_ticker_df(index_content_df):
    index_content_tickers_list = index_content_df['Ticker']
    index_content_tickers_list = index_content_tickers_list.tolist()
    new_index_content_tickers_list = []
    for item in index_content_tickers_list:
        new_index_content_tickers_list.append(str(item))
    return new_index_content_tickers_list

  
#--------------------------------------------------------------------------------------------------------------------
#Description:Extract adj close price for each stock on the index from Yahoo Finance web site and clean the data 
#Input:start date, end date, index ticker list
#Return the index Adj close price data frame
#-----------------------------------------------------------------------------------------------------------------------
def start_date(reporting_year_period = 365*5):
    return pd.Timestamp.today() - pd.Timedelta(days = reporting_year_period)

def create_adj_close_price_df(reporting_year_period, content_ticker_list):
    start_date = reporting_year_period
    end_date = date.today()
    selected_assets_yahoo_adj_close_price_data = yf.download(content_ticker_list, start_date, end_date, ['Adj Close'], period ='max')
    selected_assets_adj_close_price_df = selected_assets_yahoo_adj_close_price_data['Adj Close']
    index_adj_close_price_df = selected_assets_adj_close_price_df.dropna(axis=1)
    return index_adj_close_price_df

def asset_daily_price(price_df,number_of_asset):
    print('\nPlotting the first 5 assets daily adj closed prices\n')
    price_df.iloc[:,:number_of_asset].plot(figsize=(15,6)) 
    plt.show()
  
In [4]:
print('\nData collection and preprocessing\n')
index_content_df = read_index_content('https://en.wikipedia.org/wiki/S%26P/TSX_Composite_Index',3)
content_ticker_list = generate_ticker_df(index_content_df)
index_adj_close_price_df = create_adj_close_price_df( start_date(365*5), content_ticker_list )
print('\nList of companies\n')
display(index_content_df)
print('\nAdjusted Close Price Data Frame\n')
display(index_adj_close_price_df)
print('\nData structure\n')
index_adj_close_price_df.info()
print('\nData statics summary\n')
display(index_adj_close_price_df.describe().transpose())
Data collection and preprocessing

[*********************100%%**********************]  225 of 225 completed
105 Failed downloads:
['REI.UN', 'FRU', 'EMA', 'WN', 'NWC', 'DML', 'TOU', 'OLA', 'WDO', 'FIL', 'CFP', 'KEL', 'NPI', 'FFH', 'BIR', 'POU', 'AOI', 'INE', 'KNT', 'SRU.UN', 'CU', 'WTE', 'LUN', 'WSP', 'MTY', 'RUS', 'EFN', 'MRU', 'MATR', 'DFY', 'EIF', 'TIH', 'LNR', 'GWO', 'ATD', 'EQB', 'TOY', 'IFP', 'CCL.B', 'DSG', 'CJT', 'BBD.B', 'ARX', 'WPK', 'IMG', 'ABX', 'FVI', 'IFC', 'RCH', 'KXS', 'IVN', 'LUG', 'AAV', 'ALA', 'SIA', 'CPX', 'GEI', 'WCP']: Exception('%ticker%: No price data found, symbol may be delisted (1d 2019-09-18 15:57:34.705069 -> 2024-09-16)')
['CPG', 'ERF', 'ENGH', 'TCN']: Exception('%ticker%: No data found, symbol may be delisted')
['HR.UN', 'IPCO', 'BIP.UN', 'CCA', 'FCR.UN', 'PKI', 'PMZ.UN', 'NWH.UN', 'CSU', 'TECK.B', 'CS', 'EMP.A', 'BEI.UN', 'BEP.UN', 'QBR.B', 'CRT.UN', 'ATRL', 'ATH', 'IIP.UN', 'AP.UN', 'FTT', 'CNR', 'CRR.UN', 'KMP.UN', 'GRT.UN', 'DPM', 'CAR.UN', 'POW', 'TCL.A', 'DIR.UN', 'CHP.UN', 'MTL', 'TSU', 'GIB.A', 'RCI.B', 'TA', 'BDGI', 'BBU.UN', 'CSH.UN', 'ONEX', 'ACO.X', 'CTC.A']: Exception('%ticker%: No timezone found, symbol may be delisted')
['HWX']: Exception("%ticker%: Period 'max' is invalid, must be one of ['1d', '5d']")

List of companies

Ticker Company Sector Industry
0 AAV Advantage Energy Ltd. Energy Oil & Gas Exploration and Production
1 AOI Africa Oil Corp. Energy Oil & Gas Exploration and Production
2 AEM Agnico Eagle Mines Limited Basic Materials Metals & Mining
3 AC Air Canada Industrials Transportation
4 AGI Alamos Gold Inc. Basic Materials Metals & Mining
... ... ... ... ...
220 WTE Westshore Terminals Investment Corporation Industrials Transportation
221 WPM Wheaton Precious Metals Corp. Basic Materials Metals & Mining
222 WCP Whitecap Resources Inc. Energy Oil & Gas Exploration and Production
223 WPK Winpak Ltd. Consumer Cyclical Packaging & Containers
224 WSP WSP Global Inc. Industrials Construction

225 rows × 4 columns

Adjusted Close Price Data Frame

AC AEM AGI AQN ATS BB BCE BHC BLDP BLX ... TPZ TRI TRP TVE TXG VET WCN WFG WPM X
Date
2019-09-18 35.345322 50.151981 5.939981 10.246118 13.800000 7.52 36.354038 23.190001 5.47 14.192307 ... 11.939594 59.539158 37.488026 22.429239 62.000000 14.717662 86.776352 39.590134 25.364801 12.059963
2019-09-19 35.645519 50.451710 6.111328 10.291792 13.800000 7.57 36.248428 23.150000 5.24 14.113462 ... 11.939594 59.698231 37.650688 22.429239 61.119999 14.820879 87.075195 39.590134 25.617979 10.713511
2019-09-20 34.851452 51.306839 6.206520 10.352691 13.710000 7.54 36.489826 22.860001 5.37 14.070457 ... 11.995965 58.991207 38.227425 22.552620 60.500000 15.001518 87.364365 39.590134 25.692993 10.471342
2019-09-23 34.899876 52.470490 6.330270 10.337466 13.710000 7.51 36.429478 22.559999 5.56 14.034618 ... 11.982703 59.194481 38.456642 22.420422 60.330002 15.010121 87.711403 38.624985 26.340006 10.694136
2019-09-24 34.599686 52.805489 6.387385 10.512550 13.670000 5.81 36.678429 22.020000 5.44 14.063291 ... 11.916386 59.530315 38.360523 22.402796 54.299999 14.588629 88.010254 38.803024 26.668209 10.374474
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 32.720001 77.760002 18.135992 5.280000 26.260000 2.35 36.080002 6.200000 1.69 31.059999 ... 18.049999 168.600006 47.070000 22.520000 21.820000 9.040000 185.000000 87.669998 58.560001 32.820000
2024-09-10 32.880001 78.910004 18.675280 5.310000 26.340000 2.38 35.299999 6.250000 1.72 30.760000 ... 18.049999 171.449997 45.790001 22.480000 21.660000 8.940000 184.679993 87.489998 59.410000 31.219999
2024-09-11 32.799999 79.099998 18.885000 5.350000 26.510000 2.45 35.189999 6.380000 1.75 29.980000 ... 17.940001 172.130005 45.880001 22.520000 22.100000 9.150000 185.389999 86.760002 59.270000 33.389999
2024-09-12 33.000000 81.860001 20.059999 5.390000 26.200001 2.47 35.259998 6.300000 1.72 30.200001 ... 18.100000 173.759995 46.090000 22.520000 22.549999 9.200000 185.990005 88.260002 61.369999 34.740002
2024-09-13 33.220001 83.169998 20.690001 5.500000 25.709999 2.48 35.400002 6.320000 1.80 30.740000 ... 18.200001 172.699997 46.549999 22.510000 22.370001 9.220000 185.679993 90.510002 62.560001 36.070000

1256 rows × 99 columns

Data structure

<class 'pandas.core.frame.DataFrame'>
Index: 1256 entries, 2019-09-18 00:00:00 to 2024-09-13 00:00:00
Data columns (total 99 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AC      1256 non-null   float64
 1   AEM     1256 non-null   float64
 2   AGI     1256 non-null   float64
 3   AQN     1256 non-null   float64
 4   ATS     1256 non-null   float64
 5   BB      1256 non-null   float64
 6   BCE     1256 non-null   float64
 7   BHC     1256 non-null   float64
 8   BLDP    1256 non-null   float64
 9   BLX     1256 non-null   float64
 10  BMO     1256 non-null   float64
 11  BN      1256 non-null   float64
 12  BNS     1256 non-null   float64
 13  BTE     1256 non-null   float64
 14  BTO     1256 non-null   float64
 15  BYD     1256 non-null   float64
 16  CAE     1256 non-null   float64
 17  CCO     1256 non-null   float64
 18  CG      1256 non-null   float64
 19  CIGI    1256 non-null   float64
 20  CIX     1256 non-null   float64
 21  CLS     1256 non-null   float64
 22  CM      1256 non-null   float64
 23  CNQ     1256 non-null   float64
 24  CP      1256 non-null   float64
 25  CVE     1256 non-null   float64
 26  CWB     1256 non-null   float64
 27  DOL     1256 non-null   float64
 28  DOO     1256 non-null   float64
 29  EFR     1256 non-null   float64
 30  ELD     1256 non-null   float64
 31  ENB     1256 non-null   float64
 32  EQX     1256 non-null   float64
 33  ERO     1256 non-null   float64
 34  FM      1256 non-null   float64
 35  FNV     1256 non-null   float64
 36  FR      1256 non-null   float64
 37  FSV     1256 non-null   float64
 38  FTS     1256 non-null   float64
 39  GIL     1256 non-null   float64
 40  GOOS    1256 non-null   float64
 41  GSY     1256 non-null   float64
 42  H       1256 non-null   float64
 43  HBM     1256 non-null   float64
 44  IAG     1256 non-null   float64
 45  IGM     1256 non-null   float64
 46  IMO     1256 non-null   float64
 47  K       1256 non-null   float64
 48  KEY     1256 non-null   float64
 49  L       1256 non-null   float64
 50  LAAC    1256 non-null   float64
 51  MAG     1256 non-null   float64
 52  MFC     1256 non-null   float64
 53  MG      1256 non-null   float64
 54  MX      1256 non-null   float64
 55  NAN     1256 non-null   float64
 56  NG      1256 non-null   float64
 57  NGD     1256 non-null   float64
 58  NTR     1256 non-null   float64
 59  NXE     1256 non-null   float64
 60  OGC     1256 non-null   float64
 61  OR      1256 non-null   float64
 62  OSK     1256 non-null   float64
 63  OTEX    1256 non-null   float64
 64  PAAS    1256 non-null   float64
 65  PBH     1256 non-null   float64
 66  PD      1256 non-null   float64
 67  PEY     1256 non-null   float64
 68  PPL     1256 non-null   float64
 69  PRMW    1256 non-null   float64
 70  PSI     1256 non-null   float64
 71  PSK     1256 non-null   float64
 72  QSR     1256 non-null   float64
 73  RY      1256 non-null   float64
 74  SAP     1256 non-null   float64
 75  SHOP    1256 non-null   float64
 76  SII     1256 non-null   float64
 77  SIL     1256 non-null   float64
 78  SJ      1256 non-null   float64
 79  SLF     1256 non-null   float64
 80  SPB     1256 non-null   float64
 81  SSL     1256 non-null   float64
 82  SSRM    1256 non-null   float64
 83  STN     1256 non-null   float64
 84  SU      1256 non-null   float64
 85  T       1256 non-null   float64
 86  TD      1256 non-null   float64
 87  TFII    1256 non-null   float64
 88  TLRY    1256 non-null   float64
 89  TPZ     1256 non-null   float64
 90  TRI     1256 non-null   float64
 91  TRP     1256 non-null   float64
 92  TVE     1256 non-null   float64
 93  TXG     1256 non-null   float64
 94  VET     1256 non-null   float64
 95  WCN     1256 non-null   float64
 96  WFG     1256 non-null   float64
 97  WPM     1256 non-null   float64
 98  X       1256 non-null   float64
dtypes: float64(99)
memory usage: 981.2+ KB

Data statics summary

count mean std min 25% 50% 75% max
AC 1256.0 36.501842 3.103101 25.417490 34.364492 36.198771 38.322836 61.728199
AEM 1256.0 53.826276 9.554856 32.276482 47.322629 52.197571 58.678662 83.169998
AGI 1256.0 9.527597 3.251711 3.709501 7.324404 8.330248 11.803682 20.690001
AQN 1256.0 9.877426 2.822586 4.757608 6.719469 10.848601 12.409449 14.397230
ATS 1256.0 28.954682 10.538851 10.000000 17.435000 31.440001 38.165001 48.730000
... ... ... ... ... ... ... ... ...
VET 1256.0 11.502070 5.511316 1.587390 7.007686 11.738775 14.317163 28.071140
WCN 1256.0 124.517144 25.728199 69.163422 99.587046 127.753422 137.964344 186.500000
WFG 1256.0 67.813160 18.790793 14.714417 59.307963 73.519878 80.926613 98.821205
WPM 1256.0 41.093440 8.054813 22.332947 37.355909 41.533010 45.807121 62.560001
X 1256.0 23.344395 10.808583 4.768804 16.141821 23.383692 29.996864 49.411919

99 rows × 8 columns

2. Exploratory Data Analysis (EDA)¶

In [5]:
def plot_assets_distribution(df,xlabel, ylabel, title=''):
    # Define the number of assets
    n_assets = df.shape[1]
    # Create subplots
    fig, axes = plt.subplots(1, n_assets, figsize=(23,  3))
    if n_assets == 1:
        axes = [axes]

    # Iterate over each asset
    for i, asset in enumerate(df.columns):
        g =sns.histplot(df[asset], kde=True, ax=axes[i])
        axes[i].set_title(f'{title + asset}')
        axes[i].set_xlabel(xlabel)
        axes[i].set_ylabel(ylabel)
        
        # Calculate and display statistics
        mean_return = df[asset].mean()
        std_dev = df[asset].std()
        skewness = df[asset].skew()
        kurtosis = df[asset].kurtosis()

         # Add statistics below the plot
        statistics = (f"Mean: {mean_return:.4f}\n"
                 f"Std Dev: {std_dev:.4f}\n"
                  f"Skewness: {skewness:.4f}\n"
                 f"Kurtosis: {kurtosis:.4f}")
    
        # Place the text under the plot
        axes[i].text(0.3, -0.3, statistics, transform=axes[i].transAxes, 
                fontsize=10, verticalalignment='top', bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="lightgrey"))

def normalize_asset_daily_price(price_df):
    return (price_df / price_df.iloc[0])*100
    
       
def plot_normalize_asset_daily_price(p_normalized_asset_daily_price_df,number_of_asset):

    i_normalized_asset_daily_price_df = p_normalized_asset_daily_price_df.iloc[:,:number_of_asset]
    #normalized_asset_daily_price_df = (normalized_asset_daily_price_df / normalized_asset_daily_price_df.iloc[0])*100
    #normalized_asset_cols_size = len(normalized_asset_daily_price_df.columns)
    i_normalized_asset_daily_price_df.plot(figsize = (15, 6))
    #plt.show()
    plot_assets_distribution(i_normalized_asset_daily_price_df, 'Adjusted Close Price','Frequency')

def calculate_stock_price_log_return(index_adj_close_price_df):
    log_returns = np.log(index_adj_close_price_df / index_adj_close_price_df.shift(1))
    log_returns = log_returns.dropna(how = 'all')
    return log_returns

#removing asset with negative expected return
def removing_assets_with_negative_expected_return(log_returns,threshold):
    # Calculate the correlation matrix
    #corr_matrix = expected_returns.corr()
    # Create a list to store uncorrelated assets
    assets_with_positive_expected_return = []
    # Iterate through the correlation matrix
    for asset in log_returns.columns:
        # Check if the asset is uncorrelated with all other assets
        #for other_assets in corr_matrix.columns:
            if log_returns.mean()[asset] > threshold:
                assets_with_positive_expected_return.append(asset) 
    assets_with_positive_expected_return_list = list(dict.fromkeys(assets_with_positive_expected_return))
    return assets_with_positive_expected_return_list


def positive_assets_log_returns_df(log_returns_df, positive_assets_list):
    return log_returns_df[positive_assets_list]

def stocks_initial_price(positive_assets_list):  
    return index_adj_close_price_df.iloc[0][positive_assets_list]

def generate_asset_volatility(frequency_date_column, log_return_df):
    frequency = frequency_date_column[0].upper()
   

    assets_volatility_df = log_return_df.rolling(center=False,window= 252).std() * np.sqrt(252)
    for col in list(assets_volatility_df.columns):
        assets_volatility_df = assets_volatility_df.rename(columns={col: col+' Volatility'})
    
    assets_volatility_df = assets_volatility_df.dropna(axis=0)
    
    assets_volatility_df[frequency_date_column] = pd.to_datetime(assets_volatility_df.index, format = '%m/%Y')
    assets_volatility_df[frequency_date_column] = assets_volatility_df[frequency_date_column].dt.to_period(frequency)
        
    assets_volatility_df.set_index(frequency_date_column, inplace=True)
    assets_volatilities = assets_volatility_df.groupby(frequency_date_column).mean()
    #assets_volatilities = round(assets_volatilities,4)
    assets_volatilities = assets_volatilities.dropna(axis=0)
    return assets_volatilities

def plotting_assets_log_returns(df,xlabel, ylabel, title=''):
    # Define the number of assets
    n_assets = df.shape[1]
    # Create subplots
    fig, axes = plt.subplots(1, n_assets, figsize=(20,  3))
   
    if n_assets == 1:
        axes = [axes]
    
    for i, column in enumerate(df.columns):
        axes[i].plot(df[column], label=column)
        axes[i].set_title(f'{column}')
        axes[i].set_xlabel(xlabel)
        axes[i].set_ylabel(ylabel)
       

    # Set common labels
    plt.xlabel(xlabel, fontsize=12)
    #plt.tight_layout()
    #plt.show()
    
    
def plotting_assets_volatility(df,xlabel, ylabel, title=''):
    
    # Define the number of assets
    n_assets = df.shape[1]
    # Create subplots
    fig, axes = plt.subplots(1, n_assets, figsize=(30,  3))
   
    if n_assets == 1:
        axes = [axes]
        
    df.index= df.index.to_timestamp()
    #df.index = date.dt.strftime('%Y')
    for i, ticker in enumerate(df.columns):
        axes[i].plot(df.index, df[ticker], label=ticker)
        axes[i].set_title(f'{ticker}')
        axes[i].set_xlabel(xlabel)
        axes[i].set_ylabel(ylabel)
        
    plt.xlabel(xlabel, fontsize=8)
    plt.tight_layout(pad=0.05, w_pad=0.01, h_pad=1.0)

    #plt.show()
    

    
def portfolio_arihtmetics(log_returns,stocks_initial_prices):
    return pd.DataFrame({'mu expected_return':log_returns.mean(),
                         'variance':log_returns.var(),
                         'Sigmas(volatilities)':log_returns.std(),
                         'modifiy shape(Er)/𝝈':log_returns.mean()/log_returns.std(),
                         'initial price': stocks_initial_prices}).transpose()
In [6]:
number_of_asset =5
normalized_asset_daily_price_df = normalize_asset_daily_price(index_adj_close_price_df)
stock_price_log_return = calculate_stock_price_log_return(normalized_asset_daily_price_df)
log_returns = positive_assets_log_returns_df(stock_price_log_return, 
                                             removing_assets_with_negative_expected_return(stock_price_log_return,0))
asset_volatility_df = generate_asset_volatility('Quarter', log_returns)
positive_assets_list = removing_assets_with_negative_expected_return(stock_price_log_return,0)
stocks_initial_prices = stocks_initial_price(positive_assets_list)
portfolio_arihtmetics_df = portfolio_arihtmetics(log_returns,stocks_initial_prices).transpose()
In [7]:
print('\nExploratory Data Analysis (EDA)\n')
#index_adj_close_price_df.iloc[0] # first row
asset_daily_price(index_adj_close_price_df,number_of_asset) #Plotting the first 5 assets daily adj closed prices
plot_normalize_asset_daily_price(normalized_asset_daily_price_df,number_of_asset)  #Normalization of adj closed prices to 100
Exploratory Data Analysis (EDA)


Plotting the first 5 assets daily adj closed prices

The graphs show that the distribution of closing stock prices has the following characteristics:

-- Non-stationarity: Stock prices tend to increase over time, making the distribution dynamic and time-driven. This is in contrast to normal distributions which are stationary.
-- Fat tails (leptokurtosis): Stock price distributions often have more extreme values ​​(fat tails) compared to a normal distribution. This means that there are more significant price changes than a normal distribution would predict.
-- Skewedness: Stock prices can be asymmetrical, meaning that they are not symmetrical. For example, there may be more upward or downward price movements.

This leads us to calculate the logarithmic returns of assets

Assets log return and Volatility Calculation¶

In this section, we will calculate the assets log return instead of arithmetic return. The arithmetic return is the percentage change in the asset's price from one period to the next where as the log return of an asset over a period is calculated as the natural logarithm of the ratio of the ending price to the starting price.

$$ Arithmetic Return: R = \frac{P_t - P_{t-1}}{P_{t-1}} $$

. $$ Log Return: \text{Log Return} = \ln\left(\frac{P_t}{P_{t-1}}\right) $$.

Throughout this project, we will use asset log returns instead of arithmetic returns, simply because, in the upcoming sections we will perform stochastic simulation of the stock prices to calculate Profit & Loss, VaR, CVaR and stress testing. Log returns are commonly used in the financial literature to perform financial modeling like asset prices modeling over time, as prices cannot be negative but can increase indefinitely. Log returns are normally distributed with Fat-Tailed that make them more likely to predict extreme returns than assuming arithmetic returns to be normally distributed. As we know, stocks are traded with very high frequency over very short period of time and the form of their distributions are unknown as we can see in the plottings above. This leads to use log returns witch naturally account for continuous compounding and more accurate instead of arithmetic returns witch are based on simple interest. Furthermore, as opposed to arithmetic returns, log returns are additive meaning that you can add log returns over multiple periods to get the total log return.

In [8]:
print('\nAssets log return data frame\n')
display(log_returns)
print('\nAssets volatility data frame\n')
display(asset_volatility_df)
print('\nPortfolio arithmetics\n')
display(portfolio_arihtmetics_df)
#print('\nAssets log returns Distribution\n')
plot_assets_distribution(log_returns.iloc[:,:number_of_asset], 'log_returns','Frequency')
#print('\nAssets Volatility Distribution\n')
plot_assets_distribution(asset_volatility_df.iloc[:,:number_of_asset], 'Volatility','Frequency')
plotting_assets_log_returns(log_returns.iloc[:,:number_of_asset], 'Date','log_returns')
plotting_assets_volatility(asset_volatility_df.iloc[:,:number_of_asset], 'Date','Volatility')
Assets log return data frame

AEM AGI ATS BLX BMO BN BNS BTE BTO BYD ... TD TFII TPZ TRI TRP TVE WCN WFG WPM X
Date
2019-09-19 0.005959 0.028438 0.000000 -0.005571 0.001910 0.010767 0.000534 0.018018 -0.006152 -0.014676 ... 0.005576 0.000000 0.000000 0.002668 0.004330 0.000000 0.003438 0.000000 0.009932 -0.118386
2019-09-20 0.016807 0.015456 -0.006543 -0.003052 0.001362 -0.004812 0.000889 0.046520 -0.001235 -0.019909 ... 0.002256 0.000000 0.004710 -0.011914 0.015202 0.005486 0.003315 0.000000 0.002924 -0.022863
2019-09-23 0.022427 0.019743 0.000000 -0.002550 -0.005186 -0.014012 0.003372 -0.005698 -0.000927 -0.008154 ... 0.000000 0.002291 -0.001106 0.003440 0.005978 -0.005879 0.003964 -0.024681 0.024871 0.021053
2019-09-24 0.006364 0.008982 -0.002922 0.002041 -0.007554 -0.010592 0.006709 -0.064920 -0.013699 -0.022473 ... -0.003996 0.000000 -0.005550 0.005657 -0.002503 -0.000786 0.003401 0.004599 0.012383 -0.030347
2019-09-25 -0.024847 -0.053571 -0.006606 0.012662 0.009195 0.008897 0.003864 0.018127 0.014626 -0.008811 ... -0.002265 0.000000 -0.000556 0.005183 0.000000 -0.003942 -0.003292 0.000000 -0.036159 0.066812
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.011251 0.003309 0.032904 0.002256 0.008346 0.022295 0.015522 -0.006536 0.004310 0.010163 ... 0.018057 0.002572 0.004999 0.014097 0.008106 0.003559 0.012948 0.004000 0.009954 0.048379
2024-09-10 0.014681 0.029302 0.003042 -0.009706 -0.001567 0.001483 0.003115 -0.029952 -0.006163 -0.016714 ... -0.006529 -0.013214 0.000000 0.016763 -0.027570 -0.001778 -0.001731 -0.002055 0.014411 -0.049979
2024-09-11 0.002405 0.011167 0.006433 -0.025685 0.017460 0.017833 0.006007 0.013423 -0.001856 -0.007953 ... 0.010587 0.030752 -0.006113 0.003958 0.001964 0.001778 0.003837 -0.008379 -0.002359 0.067198
2024-09-12 0.034298 0.060360 -0.011763 0.007311 0.009205 0.019594 -0.000580 0.019803 0.007713 0.019516 ... 0.002751 0.001961 0.008879 0.009425 0.004567 0.000000 0.003231 0.017141 0.034818 0.039635
2024-09-13 0.015876 0.030923 -0.018879 0.017723 0.005038 0.008339 0.005975 -0.006557 0.005940 0.023286 ... 0.004996 0.000350 0.005510 -0.006119 0.009931 -0.000444 -0.001668 0.025173 0.019205 0.037570

1255 rows × 79 columns

Assets volatility data frame

AEM Volatility AGI Volatility ATS Volatility BLX Volatility BMO Volatility BN Volatility BNS Volatility BTE Volatility BTO Volatility BYD Volatility ... TD Volatility TFII Volatility TPZ Volatility TRI Volatility TRP Volatility TVE Volatility WCN Volatility WFG Volatility WPM Volatility X Volatility
Quarter
2020Q3 0.504936 0.722582 0.400987 0.657471 0.495752 0.512884 0.446284 1.090984 0.781376 0.931228 ... 0.437350 0.490838 0.768609 0.306863 0.488613 0.108714 0.334995 0.692859 0.460984 0.751607
2020Q4 0.517322 0.735122 0.415200 0.669080 0.503352 0.532221 0.454041 1.117460 0.794128 0.938931 ... 0.442245 0.497988 0.774649 0.310427 0.501494 0.119606 0.334636 0.702167 0.479239 0.758829
2021Q1 0.507920 0.710529 0.436002 0.645300 0.475903 0.523339 0.433218 1.103703 0.761213 0.888265 ... 0.419510 0.540971 0.704425 0.304597 0.476976 0.115931 0.318027 0.680943 0.501728 0.815657
2021Q2 0.394552 0.498487 0.390192 0.364704 0.264166 0.330288 0.238358 0.833577 0.422179 0.507479 ... 0.231210 0.415803 0.250732 0.210103 0.268063 0.105479 0.188678 0.478729 0.437271 0.789773
2021Q3 0.368136 0.462163 0.378344 0.279486 0.202586 0.292768 0.183985 0.719885 0.339572 0.426810 ... 0.179408 0.412925 0.200002 0.198581 0.235408 0.111091 0.147088 0.403321 0.393466 0.728007
2021Q4 0.333092 0.418524 0.360240 0.238818 0.185123 0.262962 0.171322 0.673490 0.311613 0.411541 ... 0.167922 0.416557 0.186838 0.194944 0.209064 0.109803 0.145107 0.368686 0.351394 0.692851
2022Q1 0.327136 0.390095 0.346108 0.217548 0.186195 0.250669 0.166026 0.610134 0.335228 0.405959 ... 0.172636 0.357399 0.176881 0.187809 0.184952 0.112337 0.158590 0.352844 0.305565 0.606421
2022Q2 0.351677 0.394403 0.385216 0.209204 0.202487 0.282420 0.183523 0.602856 0.386554 0.394990 ... 0.197151 0.393487 0.196335 0.188245 0.198772 0.119470 0.178486 0.368329 0.297601 0.545716
2022Q3 0.387415 0.430022 0.432768 0.226257 0.224788 0.310316 0.205239 0.653420 0.404918 0.409088 ... 0.218834 0.423243 0.219101 0.197006 0.227495 0.121693 0.203167 0.421957 0.322888 0.542057
2022Q4 0.431849 0.460790 0.459860 0.256554 0.253014 0.344514 0.235036 0.668572 0.412929 0.408188 ... 0.242017 0.446166 0.233564 0.210808 0.269814 0.120017 0.226986 0.450482 0.363219 0.558721
2023Q1 0.427578 0.447831 0.460281 0.267589 0.255157 0.357661 0.245553 0.654676 0.383014 0.383679 ... 0.241562 0.433118 0.224136 0.212842 0.291662 0.134372 0.230723 0.454679 0.370638 0.548080
2023Q2 0.403774 0.420084 0.415433 0.289826 0.250314 0.348234 0.242711 0.624774 0.380361 0.338813 ... 0.235099 0.389740 0.205327 0.206625 0.291595 0.133922 0.217230 0.416975 0.363502 0.517238
2023Q3 0.368457 0.369703 0.360865 0.282366 0.234860 0.336513 0.230422 0.525978 0.362044 0.282926 ... 0.222803 0.363645 0.174805 0.213448 0.287583 0.121043 0.198279 0.345792 0.342282 0.523164
2023Q4 0.318009 0.329529 0.331048 0.265068 0.208913 0.325726 0.217096 0.471112 0.353394 0.267371 ... 0.203985 0.332438 0.144614 0.200450 0.252149 0.112219 0.182783 0.301782 0.304097 0.501732
2024Q1 0.301018 0.325923 0.323614 0.270098 0.203953 0.309186 0.209275 0.447827 0.351019 0.262426 ... 0.200144 0.324681 0.126838 0.194390 0.220564 0.098177 0.176920 0.300753 0.290663 0.502309
2024Q2 0.294487 0.334375 0.331103 0.256207 0.202602 0.290805 0.199596 0.422513 0.285629 0.286738 ... 0.192505 0.303650 0.120100 0.196177 0.203598 0.085080 0.173198 0.295778 0.294560 0.487154
2024Q3 0.300283 0.336350 0.331269 0.264102 0.216194 0.287105 0.195979 0.420838 0.257450 0.309874 ... 0.187867 0.282098 0.133813 0.181548 0.182768 0.083004 0.171691 0.296492 0.296150 0.438869

17 rows × 79 columns

Portfolio arithmetics

mu expected_return variance Sigmas(volatilities) modifiy shape(Er)/𝝈 initial price
AEM 0.000403 0.000602 0.024527 0.016433 50.151981
AGI 0.000994 0.000922 0.030372 0.032740 5.939981
ATS 0.000496 0.000570 0.023885 0.020757 13.800000
BLX 0.000616 0.000562 0.023698 0.025986 14.192307
BMO 0.000303 0.000350 0.018704 0.016185 58.515537
... ... ... ... ... ...
TVE 0.000003 0.000047 0.006884 0.000416 22.429239
WCN 0.000606 0.000192 0.013850 0.043764 86.776352
WFG 0.000659 0.000805 0.028367 0.023226 39.590134
WPM 0.000719 0.000529 0.023003 0.031271 25.364801
X 0.000873 0.001493 0.038644 0.022590 12.059963

79 rows × 5 columns

3. Dimensionality Reduction & Portfolio Construction using PCA, Correlation Analysis and Hierarchical Clustering¶

In this section, we will stack Principal Component Analysis (PCA), Correlation Analysis and Hierarchical Clustering methods to create a diversified portfolio containing only the most important assets with less correlation. The Principal Component Analysis (PCA) is a dimensionality reduction technique aimed at reducing the number of assets. The PCA process will take the log returns of the assets as input and will produce a correlation matrix as output by transforming the original set of assets into a smaller set of uncorrelated variables called principal components. These components capture the majority of the variance in the data. The correlation analysis process will use the correlation matrix produced by PCA and will analyze the correlation between the most important assets selected by PCA. the highly correlated assets that may be redundant will be dropped. The remaining assets are expected to maintain a well-diversified portfolio.

In [9]:
def generate_correlation_matrix(log_returns):
    return log_returns.corr(method='pearson')   

def get_selected_assets_volatility(assets_volatility_df, selected_content_ticker_list):
    
    for col in list(assets_volatility_df.columns):
        assets_volatility_df = assets_volatility_df.rename(columns={col: col.replace(' Volatility', '')})
        
    return assets_volatility_df[selected_content_ticker_list]

#Selecting most important economic factors
#-------------------------------------------------------------------------------
 #Principal Components Analysis(PCA) to select most importance assets   
#-------------------------------------------------------------------------------



def selecting_important_item_PCA_treshold_method(matrix,threshold):
    
    return matrix[(matrix.abs() > threshold).any(axis=1)].index.to_list()

def selecting_important_item_corr_treshold_method(matrix,threshold):# to be change to >=matrix < threshold).any(axis=1)
    
    return matrix[(matrix < threshold).any(axis=1)].index.to_list()

def setting_PCA_for_assets_selection(log_returns_df):
    # economic indicators dataset
   
    # Standardizing the data
    scaler = StandardScaler()
    scaled_data_df = scaler.fit_transform(log_returns_df)

    # Applying PCA
    all_pca = PCA(n_components=None)  # Use all components to find the best number of important indicators
    all_principal_components = all_pca.fit_transform(scaled_data_df)

    # Explained variance
    explained_variance = all_pca.explained_variance_ratio_

    # Principal Component Loadings(coefficients)
    loadings_matrix = all_pca.components_

    # Create a DataFrame for loadings 
    loadings_matrix_df = pd.DataFrame(loadings_matrix.T, columns=[f'PC{i+1}' for i in range(loadings_matrix.shape[0])], 
                                      index=log_returns.columns)

   
    return loadings_matrix_df, explained_variance

#----------------------

def get_num_components(explained_variance,cumulative_variance_treshold = 0.9):    
    # Determine the number of components explaining the cumulative varience treshold of the variance
    cumulative_variance = explained_variance.cumsum()
    return  (cumulative_variance <= cumulative_variance_treshold).sum() + 1

def select_top_components_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    return loadings_matrix_df.iloc[:, :num_components]
     
def select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    selected_components_df = loadings_matrix_df.iloc[:, :num_components]
    # Find indicators with high loadings
    return selected_components_df[(selected_components_df.abs() > threshold_for_high_loadings).any(axis=1)]

    
def plot_explained_variance_for_assets_selection(loadings_matrix_df, explained_variance):

    # Print explained variance
    
    explained_variance_df = pd.DataFrame(explained_variance).T
    explained_variance_df.columns = loadings_matrix_df.columns
    print('\nexplained_variance_df\n')
    display(explained_variance_df)
    
    # Plotting the explained variance
    plt.figure(figsize=(10, 6))
    plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.5, align='center', label='individual explained variance')
    plt.step(range(1, len(explained_variance) + 1), np.cumsum(explained_variance), where='mid', label='cumulative explained variance')
    plt.xlabel('Principal Components')
    plt.ylabel('Explained Variance Ratio')
    plt.title('Explained Variance by Principal Components')
    plt.legend(loc='best')
    #plt.show()
    

#----------------------
def print_explained_variance(loadings_matrix_df, explained_variance,cumulative_variance_treshold, num_components, threshold_for_highest_loadings):
     # Print explained variance
    print('\nloadings_matrix_df\n')
    display(loadings_matrix_df)
    num_components = get_num_components(explained_variance,cumulative_variance_treshold)
    top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    print('\ntop_components_df\n')
    display(top_components_df)
    print('\nMost important assets with top components\n')
    top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
    display(top_indicators_df)
    
    
    
#def get_all_assets_corr_matrix(log_returns_df, cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5 ):
            
#    all_assets_matrix =  generate_correlation_matrix(log_returns_df)
#    return all_assets_matrix

def get_most_important_assets_log_returns_df_PCA_method(log_returns_df, most_important_assets_log_returns_list_PCA):                 
    return log_returns_df[most_important_assets_log_returns_list_PCA]
        
def get_most_important_assets_corr_matrix_PCA_method(most_important_assets_log_returns_df_PCA_method):
    #PCA to select most important portfolio assets
    return generate_correlation_matrix(most_important_assets_log_returns_df_PCA_method)
    

#----------------------------------------------------------------------------------------------------------
#Stacking  Correlation Analysis and Principal Components Analysis(PCA) to select most divesified assets   
#-------------------------------------------------------------------------------------------------------------

def get_most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr(log_returns_df,most_diversify_portfolio_assets_list_stacking_PCA_and_corr):
    return log_returns_df[most_diversify_portfolio_assets_list_stacking_PCA_and_corr]

def get_stacking_PCA_and_corr_method_matrix_to_diversify_portfolio(most_diversify_portfolio_assets_df_PCA_corr_method):   
    return generate_correlation_matrix(most_diversify_portfolio_assets_df_PCA_corr_method)

#---------------------------------------------------------------------------------------------------------------------------------
#Stacking   Hierarchical Clustering, Correlation Analysis and Principal Components Analysis(PCA)  to select most divesified assets   
#---------------------------------------------------------------------------------------------------------------------------------

def get_most_diversify_portfolio_asset_hierarchical_clustering_method(returns, g, distance_threshold= 1.5):
    #          We will use this method to efficiently reduce the number of assets in our portfolio
    #          by selecting a representative asset from each cluster identified in the clustermap
    
    # Extract the linkage matrix from the clustermap
    linkage_matrix = g.dendrogram_row.linkage
    
    # Get cluster assignments
    clusters = fcluster(linkage_matrix, t=distance_threshold, criterion='distance')

    # The number of clusters
    num_clusters = len(np.unique(clusters))
        
    # Display the cluster assignments
    asset_clusters = pd.DataFrame({'Asset': returns.columns[g.dendrogram_row.reordered_ind], 'Cluster': clusters})
    return asset_clusters

# Function to find the asset closest to the centroid of each cluster:this function is generated by chartGPT
def select_most_divesified_portfolio_assets_stacking_hierarchical_clustering_method(log_returns_df, asset_clusters):
    most_divesified_portfolio_assets_list_stacking_hierarchical_clustering_method = []
    for cluster in asset_clusters['Cluster'].unique():
        cluster_assets = asset_clusters[asset_clusters['Cluster'] == cluster]['Asset']
        cluster_returns = log_returns_df[cluster_assets].mean(axis=1)  # Compute the centroid
        distances = log_returns_df[cluster_assets].apply(lambda x: np.linalg.norm(x - cluster_returns), axis=0)
        representative_asset = distances.idxmin()
        most_divesified_portfolio_assets_list_stacking_hierarchical_clustering_method.append(representative_asset)
    return most_divesified_portfolio_assets_list_stacking_hierarchical_clustering_method


def create_clustermap(assets_matrix):  
    g = sns.clustermap(assets_matrix,  method = 'ward', metric='euclidean', cmap   = 'RdBu',  annot  = True,  annot_kws = {'size': 8},
                      row_cluster=True, col_cluster=True)
    plt.close()
    return g
     

def select_most_divesified_portfolio_assets_df_stacking_hierarchical_clustering_method(log_returns_df, 
                                                                    most_divesified_portfolio_assets_list_stacking_hierarchical_clustering_method):
    return log_returns_df[most_divesified_portfolio_assets_list_stacking_hierarchical_clustering_method]

def select_most_divesified_portfolio_assets_matrix_stacking_hierarchical_clustering_method(most_divesified_portfolio_assets_df_stacking_hierarchical_clustering_method):
    return generate_correlation_matrix(most_divesified_portfolio_assets_df_stacking_hierarchical_clustering_method)

#------------------------------------------------------------------------------------------------------------------------------------------       
def plotting_selected_assets_corr_mat_clustermap(assets_matrix, title, dendrogram = True):
             
    g = sns.clustermap(assets_matrix,  method = 'ward', metric='euclidean', cmap   = 'RdBu',  annot  = True,  annot_kws = {'size': 8},
                      row_cluster=dendrogram, col_cluster=dendrogram)
    plt.subplots_adjust(top=0.85)
    plt.setp(g.ax_heatmap.get_xticklabels(), rotation=90)
    plt.setp(g.ax_heatmap.get_yticklabels(), rotation=360)
    g.cax.set_position([1.02, 0.2, 0.03, 0.4])  # [left, bottom, width, height]
    g.cax.set_ylabel('Correlation Coefficient', rotation=270, labelpad=15)  # Rotate label
    g.fig.suptitle(title, y=0.9, fontsize=12)

 #----------------------------------------------------------------------------------------------------------
 #                              Most divesified Assets Daily Volatility   
#-------------------------------------------------------------------------------------------------------------
#selected assets daily volatility
def get_selected_assets_volatility_df_from_Stack_Corr_PCA_method(selected_assets_adj_close_price_log_return_df, frequency_date_column = 'day'):
                          
    frequency = frequency_date_column[0].upper() 
    
    #Market volatility
        
    selected_assets_volatility_df = selected_assets_adj_close_price_log_return_df.rolling(center=False,window= 252).std() * np.sqrt(252)
    for col in list(selected_assets_volatility_df.columns):
        selected_assets_volatility_df = selected_assets_volatility_df.rename(columns={col: col+' Volatility'})
    
    selected_assets_volatility_df = selected_assets_volatility_df.dropna(axis=0)
    
    if frequency == 'D':
        selected_assets_volatilities = selected_assets_volatility_df
    else:
        selected_assets_volatility_df[frequency_date_column] = pd.to_datetime(selected_assets_volatility_df.index, format = '%m/%Y')
        selected_assets_volatility_df[frequency_date_column] = selected_assets_volatility_df[frequency_date_column].dt.to_period(frequency)
        
        #market_adj_close_price_log_return_frequency_df = market_volatility_df
        selected_assets_volatility_df.set_index(frequency_date_column, inplace=True)
        selected_assets_volatilities = selected_assets_volatility_df.groupby(frequency_date_column).mean()
        selected_assets_volatilities = round(selected_assets_volatilities,1)
        selected_assets_volatilities = selected_assets_volatilities.dropna(axis=0)
        
    return selected_assets_volatilities

#-------------------------portfolio arithmetics-summary------------------------

#merge content data frame and the weght data frame
def most_diversified_portfolio_arithmetics(most_divesified_portfolio_arihtmetics_df,  index_content_df ):
    most_divesified_portfolio_arihtmetics_df_reset = most_divesified_portfolio_arihtmetics_df.reset_index()
    most_divesified_portfolio_arihtmetics_df_reset.rename(columns={'index': 'Ticker'}, inplace=True)
    most_divesified_portfolio_arihtmetics_df_details = pd.merge(most_divesified_portfolio_arihtmetics_df_reset, index_content_df, how="inner", on=["Ticker"]) 
    return most_divesified_portfolio_arihtmetics_df_details

  
#----------------------------------------------- Plot the scatter matrix with regression lines------------------------------------------
def plot_scatter_matrix(df):
    #sns.pairplot(df, kind='reg',  height=3, aspect=3)
    #height=3, aspect=1.2
    g = sns.pairplot(df, kind='reg', height=3, aspect=1.2)
    g.fig.set_size_inches(12, 8) 
    #plt.suptitle("Pairplot with Regression Lines", y=1.02, fontsize=12)
    plt.suptitle("Scatter Matrix for Stock Prices with Regression Lines", y=1.02, fontsize=12, fontweight='bold', color='blue')
    # Adjust the axis font size
    plt.tick_params(axis='both', which='major', labelsize=50)
    
    #plt.show()   

    
#------------------------------------------------------plotting portfolio structure----------

def plot_portfolio_structure_pie_chart(most_divesified_portfolio_arihtmetics_df_details):
 
    most_divesified_portfolio_arihtmetics_df_details = most_divesified_portfolio_arihtmetics_df_details.sort_values(by='modifiy shape(Er)/𝝈',ascending=True)    
    industry_labels = most_divesified_portfolio_arihtmetics_df_details['Industry'].values
    sector_labels = most_divesified_portfolio_arihtmetics_df_details['Sector'].values
    modifiy_sharpe_values = most_divesified_portfolio_arihtmetics_df_details['modifiy shape(Er)/𝝈'].values
    
    # Create subplots: use 'domain' type for Pie subplot
    fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
    
    fig.add_trace(go.Pie(labels=industry_labels, values=modifiy_sharpe_values, name="Industry",
                        legendgroup="Industry",  # this can be any string, not just "group"
                        legendgrouptitle_text="Industry"), 1, 1)
    fig.add_trace(go.Pie(labels=sector_labels, values=modifiy_sharpe_values, name="Sector",
                        legendgroup="Sector",  # this can be any string, not just "group"
                        legendgrouptitle_text="Sector"), 1, 2)

    # Use `hole` to create a donut-like pie chart
    fig.update_traces(hole=.5, hoverinfo="label+percent+name")

    fig.update_layout(
    title_text=" Asset - Risk-Adjusted Return (modifiy sharpe(Er)/risk by Industry & Sector)",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Industry', x=0.14, y=0.5, font_size=20, showarrow=False),
                 dict(text='Sector', x=0.84, y=0.5, font_size=20, showarrow=False)],
    height=500, 
    width=800,
    autosize=True,
    margin=dict(t=0, b=0, l=50, r=0),
    legend_tracegroupgap = 0,
    legend=dict(    
                    orientation="v",
                    yanchor="bottom",
                    y=0,
                    xanchor="right",
                    x=1.5),
     title=dict(
                    y=0.9,
                    x=0.1,
                    xanchor= 'left',
                    yanchor= 'top'))
    
    fig.show()
    
def plot_portfolio_structure( p_most_divesified_portfolio_arihtmetics_df_details):
    fig, ax =plt.subplots(figsize=(12, 6))
      
    l_most_divesified_portfolio_arihtmetics_df_details = p_most_divesified_portfolio_arihtmetics_df_details.sort_values(by='modifiy shape(Er)/𝝈',ascending=True)
    column_list = [':      ' for i in range(len(l_most_divesified_portfolio_arihtmetics_df_details))]
    column_df = pd.DataFrame({'colum': column_list})
    
    modifiy_sharpe_values = l_most_divesified_portfolio_arihtmetics_df_details['modifiy shape(Er)/𝝈']
    Tickers = l_most_divesified_portfolio_arihtmetics_df_details['Sector'] + column_df['colum'] + \
                        l_most_divesified_portfolio_arihtmetics_df_details['Industry'] + column_df['colum'] + \
                        l_most_divesified_portfolio_arihtmetics_df_details['Company'] + \
                        column_df['colum'] + l_most_divesified_portfolio_arihtmetics_df_details['Ticker'] 
    
    bar_container= ax.barh(Tickers, modifiy_sharpe_values*100)
    ax.axes.get_xaxis().set_visible(False)
    # setting label of y-axis
    ax.set_ylabel("Asset Tickers")
    # setting label of x-axis
    ax.set_xlabel("Asset modifiy sharpe(Er)/𝝈") 
    ax.set_title(" Most Diversified portfolio Structure: Asset Risk-Adjusted Return (modifiy sharpe(Er)/risk)",fontsize=22,  horizontalalignment='right',fontweight='roman')
    ax.bar_label(bar_container, fmt='{:,.1f}%')
    
    
    plt.show()
    #Asset return pie chart
    plot_portfolio_structure_pie_chart( p_most_divesified_portfolio_arihtmetics_df_details) 
In [10]:
# Data setting- 

cumulative_variance_treshold = 1.0
threshold_for_highest_loadings = 0.5
correlation_coefficient_treshold = 0.3 
distance_threshold= 0.5 # parameter to determining the number of clusters in hierarchical clustering.

#-------------------------------------PCA_method----------------------
loadings_matrix_df, explained_variance  = setting_PCA_for_assets_selection(log_returns)
num_components = get_num_components(explained_variance,cumulative_variance_treshold)
top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
most_important_assets_list_PCA_treshold_method = selecting_important_item_PCA_treshold_method(top_indicators_df, threshold_for_highest_loadings)
most_important_assets_log_returns_df_PCA_method = get_most_important_assets_log_returns_df_PCA_method(log_returns, most_important_assets_list_PCA_treshold_method)
most_important_assets_corr_matrix_PCA_method = get_most_important_assets_corr_matrix_PCA_method(most_important_assets_log_returns_df_PCA_method)

#----------------------------------------------------Correlation method-------------------------
selected_assets_list_correlation_method = selecting_important_item_corr_treshold_method(most_important_assets_corr_matrix_PCA_method, 
                                                                                            correlation_coefficient_treshold)
selected_assets_log_return_df_correlation_method = log_returns[selected_assets_list_correlation_method]
           
selected_assets_log_return_Corr_matrix_correlation_method =  generate_correlation_matrix(selected_assets_log_return_df_correlation_method)

    
#-------------------------stack_PCA_corr_method---------------------------------------------------------------

most_diversify_portfolio_assets_list_stack_PCA_corr_method = selecting_important_item_corr_treshold_method(selected_assets_log_return_Corr_matrix_correlation_method,
                                                                                              correlation_coefficient_treshold)
    
         
most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr = get_most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr(log_returns,
                                                                                                         most_diversify_portfolio_assets_list_stack_PCA_corr_method)
        
most_diversify_portfolio_assets_matrix_stacking_PCA_and_corr_method = \
                        get_stacking_PCA_and_corr_method_matrix_to_diversify_portfolio(most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr )

#------------------------------stacking_hierarchical_clustering_method------------------------------------
#get clustermap
clustermap=  create_clustermap(most_diversify_portfolio_assets_matrix_stacking_PCA_and_corr_method);
    
#get asset clusters
asset_clusters = get_most_diversify_portfolio_asset_hierarchical_clustering_method(most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr, 
                                                                                       clustermap, distance_threshold)
# Get the representative assets
most_diversify_portfolio_assets_list = \
            select_most_divesified_portfolio_assets_stacking_hierarchical_clustering_method(most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr,
                                                                                            asset_clusters)
                                                                                     
most_diversify_portfolio_assets_log_returns_df = \
                select_most_divesified_portfolio_assets_df_stacking_hierarchical_clustering_method(log_returns, most_diversify_portfolio_assets_list)


most_diversify_portfolio_assets_corr_matrix = \
                select_most_divesified_portfolio_assets_matrix_stacking_hierarchical_clustering_method(most_diversify_portfolio_assets_log_returns_df)

selected_assets_volatility_df_stacking_corr_PCA_method = \
                   get_selected_assets_volatility_df_from_Stack_Corr_PCA_method(most_diversify_portfolio_assets_log_returns_df, frequency_date_column = 'day')


most_diversify_portfolio_assets_initial_prices =  stocks_initial_prices[most_diversify_portfolio_assets_list]
most_divesified_portfolio_arihtmetics_df = portfolio_arihtmetics(most_diversify_portfolio_assets_log_returns_df, 
                                                                     most_diversify_portfolio_assets_initial_prices).transpose()
    
most_divesified_portfolio_arihtmetics_df_details  = most_diversified_portfolio_arithmetics(most_divesified_portfolio_arihtmetics_df,  index_content_df) 
In [11]:
 #-------Data printing and recording --------------------------------------------------
warnings.filterwarnings("ignore")    
print('\nInitial assets log returns\n')
display(log_returns) 
          
plot_explained_variance_for_assets_selection(loadings_matrix_df, explained_variance)
print_explained_variance(loadings_matrix_df, explained_variance,cumulative_variance_treshold, num_components, threshold_for_highest_loadings)
          
print('\nMost Important Assets Log returns  using PCA\n')
display(most_important_assets_log_returns_df_PCA_method)
plotting_selected_assets_corr_mat_clustermap(most_important_assets_corr_matrix_PCA_method, 'Most Important Assets Correlation Matrix PCA Method')
    
print('\nMost Important Assets Log returns  using Correlation method\n')
display(selected_assets_log_return_df_correlation_method)
plotting_selected_assets_corr_mat_clustermap(selected_assets_log_return_Corr_matrix_correlation_method, 
                                                 'Most Diversified Assets Correlation Matrix - Correlation Method')        
          
print('\nMost Diversified Assets Log returns  using Stack Correlation Matrix/PCA Method\n')
display(most_diversify_portfolio_asset_log_return_df_stacking_PCA_and_corr)    
plotting_selected_assets_corr_mat_clustermap(most_diversify_portfolio_assets_matrix_stacking_PCA_and_corr_method, 
                                                 'Most Diversified Assets Correlation Matrix - stacking Correlation Analysis/PCA Method')
    
print('\nMost Diversified Assets Log returns  using Stacking Hierarchical Clustering, Correlation Analysis & PCA Method\n')
display(most_diversify_portfolio_assets_log_returns_df)
plotting_selected_assets_corr_mat_clustermap(most_diversify_portfolio_assets_corr_matrix, 
                                                 'Most Diversified Assets Correlation Matrix - Stacking Hierarchical_clustering, Correlation Analysis & PCA Method')
print('\nDiversified Portfolio Assets Volatility \n')
display(selected_assets_volatility_df_stacking_corr_PCA_method)

plot_scatter_matrix(most_diversify_portfolio_assets_log_returns_df)
    
print('\nMost Diversified Portfolio arithmetics details\n')
display(most_divesified_portfolio_arihtmetics_df_details)
plot_portfolio_structure( most_divesified_portfolio_arihtmetics_df_details)
Initial assets log returns

AEM AGI ATS BLX BMO BN BNS BTE BTO BYD ... TD TFII TPZ TRI TRP TVE WCN WFG WPM X
Date
2019-09-19 0.005959 0.028438 0.000000 -0.005571 0.001910 0.010767 0.000534 0.018018 -0.006152 -0.014676 ... 0.005576 0.000000 0.000000 0.002668 0.004330 0.000000 0.003438 0.000000 0.009932 -0.118386
2019-09-20 0.016807 0.015456 -0.006543 -0.003052 0.001362 -0.004812 0.000889 0.046520 -0.001235 -0.019909 ... 0.002256 0.000000 0.004710 -0.011914 0.015202 0.005486 0.003315 0.000000 0.002924 -0.022863
2019-09-23 0.022427 0.019743 0.000000 -0.002550 -0.005186 -0.014012 0.003372 -0.005698 -0.000927 -0.008154 ... 0.000000 0.002291 -0.001106 0.003440 0.005978 -0.005879 0.003964 -0.024681 0.024871 0.021053
2019-09-24 0.006364 0.008982 -0.002922 0.002041 -0.007554 -0.010592 0.006709 -0.064920 -0.013699 -0.022473 ... -0.003996 0.000000 -0.005550 0.005657 -0.002503 -0.000786 0.003401 0.004599 0.012383 -0.030347
2019-09-25 -0.024847 -0.053571 -0.006606 0.012662 0.009195 0.008897 0.003864 0.018127 0.014626 -0.008811 ... -0.002265 0.000000 -0.000556 0.005183 0.000000 -0.003942 -0.003292 0.000000 -0.036159 0.066812
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.011251 0.003309 0.032904 0.002256 0.008346 0.022295 0.015522 -0.006536 0.004310 0.010163 ... 0.018057 0.002572 0.004999 0.014097 0.008106 0.003559 0.012948 0.004000 0.009954 0.048379
2024-09-10 0.014681 0.029302 0.003042 -0.009706 -0.001567 0.001483 0.003115 -0.029952 -0.006163 -0.016714 ... -0.006529 -0.013214 0.000000 0.016763 -0.027570 -0.001778 -0.001731 -0.002055 0.014411 -0.049979
2024-09-11 0.002405 0.011167 0.006433 -0.025685 0.017460 0.017833 0.006007 0.013423 -0.001856 -0.007953 ... 0.010587 0.030752 -0.006113 0.003958 0.001964 0.001778 0.003837 -0.008379 -0.002359 0.067198
2024-09-12 0.034298 0.060360 -0.011763 0.007311 0.009205 0.019594 -0.000580 0.019803 0.007713 0.019516 ... 0.002751 0.001961 0.008879 0.009425 0.004567 0.000000 0.003231 0.017141 0.034818 0.039635
2024-09-13 0.015876 0.030923 -0.018879 0.017723 0.005038 0.008339 0.005975 -0.006557 0.005940 0.023286 ... 0.004996 0.000350 0.005510 -0.006119 0.009931 -0.000444 -0.001668 0.025173 0.019205 0.037570

1255 rows × 79 columns

explained_variance_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC70 PC71 PC72 PC73 PC74 PC75 PC76 PC77 PC78 PC79
0 0.403539 0.100359 0.040445 0.03251 0.027595 0.021537 0.01613 0.015612 0.01322 0.01211 ... 0.001581 0.001549 0.001374 0.001331 0.001322 0.001241 0.000908 0.000862 0.000556 0.000306

1 rows × 79 columns

loadings_matrix_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC70 PC71 PC72 PC73 PC74 PC75 PC76 PC77 PC78 PC79
AEM -0.065669 0.277218 -0.010885 -0.037676 -0.045264 0.033069 -0.078202 0.056256 -0.058732 -0.033783 ... -0.151400 0.035509 -0.126888 -0.056302 0.066494 0.003332 -0.003731 -0.001027 -0.009852 0.017048
AGI -0.058055 0.293144 -0.008136 -0.050313 -0.048107 0.083242 -0.011359 -0.019832 -0.021472 0.009000 ... 0.200848 -0.142541 0.064375 -0.078013 -0.121092 0.010274 -0.062080 -0.021057 -0.011170 -0.018934
ATS -0.073845 -0.003196 0.015798 0.123400 0.075143 -0.140624 0.226966 -0.039264 -0.099206 -0.295643 ... -0.008172 0.016202 -0.001900 0.006657 0.003255 -0.004879 0.021869 -0.019858 -0.006765 0.001025
BLX -0.105663 -0.057798 -0.041605 0.040873 -0.132797 0.298437 -0.075353 -0.015556 -0.119774 0.093928 ... 0.020047 -0.048509 0.006170 -0.041851 -0.002258 0.009578 -0.002888 -0.048743 0.013112 -0.015189
BMO -0.152284 -0.072828 -0.059178 -0.050535 0.045357 -0.063581 0.165802 0.092483 -0.009710 0.068572 ... -0.065963 0.188700 0.066368 -0.695366 -0.294240 -0.254406 -0.047065 -0.062295 0.034963 -0.000135
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TVE -0.034162 0.019887 0.083680 0.052872 -0.134708 0.027806 -0.092218 -0.048004 0.667672 -0.072657 ... 0.015626 -0.004245 0.017087 -0.012480 -0.006102 0.001055 -0.010073 -0.019879 0.007495 0.015122
WCN -0.100774 -0.013572 0.166350 -0.181940 0.083366 -0.054607 -0.242837 0.024668 0.023151 0.059248 ... -0.054925 -0.039940 0.050932 0.003638 -0.020457 0.050221 -0.007624 0.002607 -0.004426 -0.014030
WFG -0.111501 -0.000092 -0.014839 0.011644 0.001620 -0.054287 0.179568 0.018910 0.121690 -0.226637 ... 0.018991 -0.003166 0.008847 -0.037237 -0.000709 -0.013008 -0.008995 -0.022354 0.004389 0.012087
WPM -0.069893 0.285385 0.016996 -0.052283 0.026792 -0.037374 0.019060 -0.001279 -0.071231 -0.023222 ... 0.256976 -0.020504 -0.040127 -0.189320 -0.096679 0.058365 -0.016540 -0.006234 -0.304184 -0.020128
X -0.091522 0.001560 -0.102661 0.066105 0.110150 0.183061 0.064658 -0.219873 0.155924 -0.025309 ... -0.011965 -0.010316 0.013302 0.029585 -0.013110 0.005908 0.001931 -0.030683 0.000352 0.007323

79 rows × 79 columns

top_components_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC70 PC71 PC72 PC73 PC74 PC75 PC76 PC77 PC78 PC79
AEM -0.065669 0.277218 -0.010885 -0.037676 -0.045264 0.033069 -0.078202 0.056256 -0.058732 -0.033783 ... -0.151400 0.035509 -0.126888 -0.056302 0.066494 0.003332 -0.003731 -0.001027 -0.009852 0.017048
AGI -0.058055 0.293144 -0.008136 -0.050313 -0.048107 0.083242 -0.011359 -0.019832 -0.021472 0.009000 ... 0.200848 -0.142541 0.064375 -0.078013 -0.121092 0.010274 -0.062080 -0.021057 -0.011170 -0.018934
ATS -0.073845 -0.003196 0.015798 0.123400 0.075143 -0.140624 0.226966 -0.039264 -0.099206 -0.295643 ... -0.008172 0.016202 -0.001900 0.006657 0.003255 -0.004879 0.021869 -0.019858 -0.006765 0.001025
BLX -0.105663 -0.057798 -0.041605 0.040873 -0.132797 0.298437 -0.075353 -0.015556 -0.119774 0.093928 ... 0.020047 -0.048509 0.006170 -0.041851 -0.002258 0.009578 -0.002888 -0.048743 0.013112 -0.015189
BMO -0.152284 -0.072828 -0.059178 -0.050535 0.045357 -0.063581 0.165802 0.092483 -0.009710 0.068572 ... -0.065963 0.188700 0.066368 -0.695366 -0.294240 -0.254406 -0.047065 -0.062295 0.034963 -0.000135
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TVE -0.034162 0.019887 0.083680 0.052872 -0.134708 0.027806 -0.092218 -0.048004 0.667672 -0.072657 ... 0.015626 -0.004245 0.017087 -0.012480 -0.006102 0.001055 -0.010073 -0.019879 0.007495 0.015122
WCN -0.100774 -0.013572 0.166350 -0.181940 0.083366 -0.054607 -0.242837 0.024668 0.023151 0.059248 ... -0.054925 -0.039940 0.050932 0.003638 -0.020457 0.050221 -0.007624 0.002607 -0.004426 -0.014030
WFG -0.111501 -0.000092 -0.014839 0.011644 0.001620 -0.054287 0.179568 0.018910 0.121690 -0.226637 ... 0.018991 -0.003166 0.008847 -0.037237 -0.000709 -0.013008 -0.008995 -0.022354 0.004389 0.012087
WPM -0.069893 0.285385 0.016996 -0.052283 0.026792 -0.037374 0.019060 -0.001279 -0.071231 -0.023222 ... 0.256976 -0.020504 -0.040127 -0.189320 -0.096679 0.058365 -0.016540 -0.006234 -0.304184 -0.020128
X -0.091522 0.001560 -0.102661 0.066105 0.110150 0.183061 0.064658 -0.219873 0.155924 -0.025309 ... -0.011965 -0.010316 0.013302 0.029585 -0.013110 0.005908 0.001931 -0.030683 0.000352 0.007323

79 rows × 79 columns

Most important assets with top components

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 ... PC70 PC71 PC72 PC73 PC74 PC75 PC76 PC77 PC78 PC79
AGI -0.058055 0.293144 -0.008136 -0.050313 -0.048107 0.083242 -0.011359 -0.019832 -0.021472 0.009000 ... 0.200848 -0.142541 0.064375 -0.078013 -0.121092 0.010274 -0.062080 -0.021057 -0.011170 -0.018934
ATS -0.073845 -0.003196 0.015798 0.123400 0.075143 -0.140624 0.226966 -0.039264 -0.099206 -0.295643 ... -0.008172 0.016202 -0.001900 0.006657 0.003255 -0.004879 0.021869 -0.019858 -0.006765 0.001025
BMO -0.152284 -0.072828 -0.059178 -0.050535 0.045357 -0.063581 0.165802 0.092483 -0.009710 0.068572 ... -0.065963 0.188700 0.066368 -0.695366 -0.294240 -0.254406 -0.047065 -0.062295 0.034963 -0.000135
BN -0.147581 -0.049083 0.067809 -0.010327 0.068137 -0.009248 0.077914 0.100975 0.069761 0.030375 ... -0.024295 0.050163 -0.147035 0.029083 0.012623 -0.131303 -0.120203 -0.041728 0.020601 0.002664
CIX -0.055727 0.012151 0.005574 0.033036 -0.082174 0.201782 -0.029220 -0.161301 0.198752 0.390850 ... -0.007709 -0.004708 0.004186 0.011902 -0.018790 -0.012573 0.003734 -0.008440 -0.019901 -0.001135
CNQ -0.129044 -0.052601 -0.266303 0.050060 0.038974 -0.130717 -0.204224 -0.042648 -0.015670 -0.005193 ... 0.210442 0.028941 0.017754 0.194277 0.003770 -0.664877 -0.024434 0.167211 0.073147 -0.007912
DOL -0.160518 -0.010954 0.042704 -0.028079 0.084447 -0.018499 0.009780 -0.076932 -0.108125 0.070747 ... -0.011061 0.050432 -0.002232 0.013674 0.011408 -0.017905 -0.054481 -0.036818 -0.006025 0.750246
DOO -0.157289 -0.010757 0.036949 -0.038733 0.075571 -0.001145 0.031621 -0.091489 -0.088259 0.055132 ... -0.055132 0.074028 -0.024020 0.007641 0.047442 -0.035191 0.022945 0.006520 -0.044958 -0.650055
ENB -0.144415 -0.040819 -0.102101 -0.085299 -0.051066 -0.156501 -0.081331 0.078633 -0.006538 0.127160 ... -0.025530 -0.589933 0.005431 -0.129889 -0.084892 -0.052072 -0.082099 -0.049808 0.037564 -0.014443
IGM -0.127941 -0.005348 0.247802 0.140876 0.187187 -0.037198 -0.125438 -0.093726 -0.045313 0.087582 ... 0.074561 0.000017 -0.096667 0.006884 -0.005883 -0.111989 0.688758 -0.389601 0.034563 0.007848
NGD -0.062664 0.234990 -0.036112 -0.013709 -0.009670 0.062311 0.012907 0.014675 0.025940 0.013104 ... -0.030877 -0.015453 0.014054 0.012862 0.023706 -0.027772 -0.027923 0.001248 -0.021424 0.010592
PEY -0.149404 -0.056981 -0.021802 -0.196603 0.035194 0.165845 0.021249 -0.125862 0.031806 -0.043429 ... -0.052847 -0.053877 -0.111617 -0.135503 0.023964 0.078462 0.429540 0.743572 -0.106402 0.063064
RY -0.152320 -0.051308 -0.021665 -0.109855 0.021532 -0.048933 0.157040 0.052900 0.019120 0.112541 ... 0.295141 0.135316 0.403495 0.063244 0.502920 -0.047708 0.004983 0.080169 -0.040179 -0.001603
SIL -0.084934 0.290093 -0.017374 0.014017 -0.016193 0.030938 0.047216 -0.032776 -0.051777 -0.011255 ... 0.020233 0.049425 -0.035915 0.007104 0.017996 0.118807 0.029920 0.099958 0.842134 -0.022465
TD -0.148354 -0.069229 -0.062435 -0.100556 0.066975 -0.042147 0.179859 0.016946 -0.013172 0.087252 ... 0.022431 0.039052 -0.080281 0.324628 -0.353065 0.077180 -0.011293 0.042773 -0.001475 -0.024636
TVE -0.034162 0.019887 0.083680 0.052872 -0.134708 0.027806 -0.092218 -0.048004 0.667672 -0.072657 ... 0.015626 -0.004245 0.017087 -0.012480 -0.006102 0.001055 -0.010073 -0.019879 0.007495 0.015122

16 rows × 79 columns

Most Important Assets Log returns  using PCA

AGI ATS BMO BN CIX CNQ DOL DOO ENB IGM NGD PEY RY SIL TD TVE
Date
2019-09-19 0.028438 0.000000 0.001910 0.010767 -0.005358 0.001106 0.001718 0.000834 0.000284 0.002345 0.000000 -0.003262 0.006974 0.006826 0.005576 0.000000
2019-09-20 0.015456 -0.006543 0.001362 -0.004812 0.000000 0.010625 -0.000859 -0.002021 0.003970 -0.011009 -0.050010 -0.002179 0.009018 0.017198 0.002256 0.005486
2019-09-23 0.019743 0.000000 -0.005186 -0.014012 -0.001344 0.006900 -0.002149 -0.003786 -0.005391 0.000182 0.105361 0.002897 -0.004067 0.029322 0.000000 -0.005879
2019-09-24 0.008982 -0.002922 -0.007554 -0.010592 -0.046104 -0.017894 -0.005535 -0.005219 0.006239 -0.014626 0.030305 -0.006572 -0.004208 0.012903 -0.003996 -0.000786
2019-09-25 -0.053571 -0.006606 0.009195 0.008897 0.023661 -0.006655 -0.002179 -0.002736 -0.001698 0.013147 -0.085655 0.007663 0.004578 -0.043564 -0.002265 -0.003942
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.003309 0.032904 0.008346 0.022295 0.078256 0.007794 0.009506 0.009678 0.009855 0.010638 0.017022 0.005634 0.016810 0.006941 0.018057 0.003559
2024-09-10 0.029302 0.003042 -0.001567 0.001483 0.001090 -0.038970 -0.007407 -0.006583 -0.013079 0.009850 0.049393 0.000000 -0.005790 0.014389 -0.006529 -0.001778
2024-09-11 0.011167 0.006433 0.017460 0.017833 0.019425 0.005573 0.003615 0.002253 -0.000497 0.024374 0.058496 -0.008463 0.008226 0.021202 0.010587 0.001778
2024-09-12 0.060360 -0.011763 0.009205 0.019594 -0.009665 0.007689 0.009640 0.004162 0.006194 0.009738 0.090478 0.003770 0.005501 0.063724 0.002751 0.000000
2024-09-13 0.030923 -0.018879 0.005038 0.008339 0.021001 -0.008813 0.002067 0.005259 0.005910 0.006512 0.070146 0.015403 -0.002989 0.048606 0.004996 -0.000444

1255 rows × 16 columns

Most Important Assets Log returns  using Correlation method

AGI ATS BMO BN CIX CNQ DOL DOO ENB IGM NGD PEY RY SIL TD TVE
Date
2019-09-19 0.028438 0.000000 0.001910 0.010767 -0.005358 0.001106 0.001718 0.000834 0.000284 0.002345 0.000000 -0.003262 0.006974 0.006826 0.005576 0.000000
2019-09-20 0.015456 -0.006543 0.001362 -0.004812 0.000000 0.010625 -0.000859 -0.002021 0.003970 -0.011009 -0.050010 -0.002179 0.009018 0.017198 0.002256 0.005486
2019-09-23 0.019743 0.000000 -0.005186 -0.014012 -0.001344 0.006900 -0.002149 -0.003786 -0.005391 0.000182 0.105361 0.002897 -0.004067 0.029322 0.000000 -0.005879
2019-09-24 0.008982 -0.002922 -0.007554 -0.010592 -0.046104 -0.017894 -0.005535 -0.005219 0.006239 -0.014626 0.030305 -0.006572 -0.004208 0.012903 -0.003996 -0.000786
2019-09-25 -0.053571 -0.006606 0.009195 0.008897 0.023661 -0.006655 -0.002179 -0.002736 -0.001698 0.013147 -0.085655 0.007663 0.004578 -0.043564 -0.002265 -0.003942
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.003309 0.032904 0.008346 0.022295 0.078256 0.007794 0.009506 0.009678 0.009855 0.010638 0.017022 0.005634 0.016810 0.006941 0.018057 0.003559
2024-09-10 0.029302 0.003042 -0.001567 0.001483 0.001090 -0.038970 -0.007407 -0.006583 -0.013079 0.009850 0.049393 0.000000 -0.005790 0.014389 -0.006529 -0.001778
2024-09-11 0.011167 0.006433 0.017460 0.017833 0.019425 0.005573 0.003615 0.002253 -0.000497 0.024374 0.058496 -0.008463 0.008226 0.021202 0.010587 0.001778
2024-09-12 0.060360 -0.011763 0.009205 0.019594 -0.009665 0.007689 0.009640 0.004162 0.006194 0.009738 0.090478 0.003770 0.005501 0.063724 0.002751 0.000000
2024-09-13 0.030923 -0.018879 0.005038 0.008339 0.021001 -0.008813 0.002067 0.005259 0.005910 0.006512 0.070146 0.015403 -0.002989 0.048606 0.004996 -0.000444

1255 rows × 16 columns

Most Diversified Assets Log returns  using Stack Correlation Matrix/PCA Method

AGI ATS BMO BN CIX CNQ DOL DOO ENB IGM NGD PEY RY SIL TD TVE
Date
2019-09-19 0.028438 0.000000 0.001910 0.010767 -0.005358 0.001106 0.001718 0.000834 0.000284 0.002345 0.000000 -0.003262 0.006974 0.006826 0.005576 0.000000
2019-09-20 0.015456 -0.006543 0.001362 -0.004812 0.000000 0.010625 -0.000859 -0.002021 0.003970 -0.011009 -0.050010 -0.002179 0.009018 0.017198 0.002256 0.005486
2019-09-23 0.019743 0.000000 -0.005186 -0.014012 -0.001344 0.006900 -0.002149 -0.003786 -0.005391 0.000182 0.105361 0.002897 -0.004067 0.029322 0.000000 -0.005879
2019-09-24 0.008982 -0.002922 -0.007554 -0.010592 -0.046104 -0.017894 -0.005535 -0.005219 0.006239 -0.014626 0.030305 -0.006572 -0.004208 0.012903 -0.003996 -0.000786
2019-09-25 -0.053571 -0.006606 0.009195 0.008897 0.023661 -0.006655 -0.002179 -0.002736 -0.001698 0.013147 -0.085655 0.007663 0.004578 -0.043564 -0.002265 -0.003942
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.003309 0.032904 0.008346 0.022295 0.078256 0.007794 0.009506 0.009678 0.009855 0.010638 0.017022 0.005634 0.016810 0.006941 0.018057 0.003559
2024-09-10 0.029302 0.003042 -0.001567 0.001483 0.001090 -0.038970 -0.007407 -0.006583 -0.013079 0.009850 0.049393 0.000000 -0.005790 0.014389 -0.006529 -0.001778
2024-09-11 0.011167 0.006433 0.017460 0.017833 0.019425 0.005573 0.003615 0.002253 -0.000497 0.024374 0.058496 -0.008463 0.008226 0.021202 0.010587 0.001778
2024-09-12 0.060360 -0.011763 0.009205 0.019594 -0.009665 0.007689 0.009640 0.004162 0.006194 0.009738 0.090478 0.003770 0.005501 0.063724 0.002751 0.000000
2024-09-13 0.030923 -0.018879 0.005038 0.008339 0.021001 -0.008813 0.002067 0.005259 0.005910 0.006512 0.070146 0.015403 -0.002989 0.048606 0.004996 -0.000444

1255 rows × 16 columns

Most Diversified Assets Log returns  using Stacking Hierarchical Clustering, Correlation Analysis & PCA Method

IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
Date
2019-09-19 0.002345 0.001106 0.001718 0.000834 0.010767 -0.003262 0.000284 0.001910 0.005576 0.000000 0.000000
2019-09-20 -0.011009 0.010625 -0.000859 -0.002021 -0.004812 -0.002179 0.003970 0.001362 0.002256 -0.050010 0.005486
2019-09-23 0.000182 0.006900 -0.002149 -0.003786 -0.014012 0.002897 -0.005391 -0.005186 0.000000 0.105361 -0.005879
2019-09-24 -0.014626 -0.017894 -0.005535 -0.005219 -0.010592 -0.006572 0.006239 -0.007554 -0.003996 0.030305 -0.000786
2019-09-25 0.013147 -0.006655 -0.002179 -0.002736 0.008897 0.007663 -0.001698 0.009195 -0.002265 -0.085655 -0.003942
... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.010638 0.007794 0.009506 0.009678 0.022295 0.005634 0.009855 0.008346 0.018057 0.017022 0.003559
2024-09-10 0.009850 -0.038970 -0.007407 -0.006583 0.001483 0.000000 -0.013079 -0.001567 -0.006529 0.049393 -0.001778
2024-09-11 0.024374 0.005573 0.003615 0.002253 0.017833 -0.008463 -0.000497 0.017460 0.010587 0.058496 0.001778
2024-09-12 0.009738 0.007689 0.009640 0.004162 0.019594 0.003770 0.006194 0.009205 0.002751 0.090478 0.000000
2024-09-13 0.006512 -0.008813 0.002067 0.005259 0.008339 0.015403 0.005910 0.005038 0.004996 0.070146 -0.000444

1255 rows × 11 columns

Diversified Portfolio Assets Volatility 

IGM Volatility CNQ Volatility DOL Volatility DOO Volatility BN Volatility PEY Volatility ENB Volatility BMO Volatility TD Volatility NGD Volatility TVE Volatility
Date
2020-09-17 0.360603 0.810664 0.308874 0.298213 0.511776 0.388163 0.472484 0.495009 0.436419 0.851092 0.106949
2020-09-18 0.360840 0.810868 0.308948 0.298232 0.511690 0.388285 0.472608 0.495128 0.436527 0.851180 0.107394
2020-09-21 0.360633 0.812542 0.309930 0.299506 0.512432 0.389360 0.472802 0.495544 0.437326 0.851868 0.107525
2020-09-22 0.361150 0.812808 0.309925 0.299597 0.512336 0.389364 0.473020 0.495678 0.437342 0.846089 0.107430
2020-09-23 0.362196 0.813346 0.310158 0.299681 0.512973 0.390019 0.474124 0.495760 0.437471 0.854508 0.107572
... ... ... ... ... ... ... ... ... ... ... ...
2024-09-09 0.213149 0.280058 0.122660 0.121811 0.292684 0.163629 0.161075 0.226355 0.186543 0.573182 0.084597
2024-09-10 0.213319 0.282738 0.122913 0.122023 0.292288 0.163627 0.161460 0.226361 0.186667 0.575011 0.084177
2024-09-11 0.214513 0.281679 0.122394 0.121479 0.292176 0.163795 0.161413 0.226418 0.185932 0.577119 0.084182
2024-09-12 0.213909 0.280979 0.122713 0.121532 0.292604 0.163765 0.161069 0.226494 0.185897 0.583353 0.084043
2024-09-13 0.213971 0.281040 0.122721 0.121607 0.292210 0.164200 0.161088 0.226239 0.185794 0.586887 0.084009

1004 rows × 11 columns

Most Diversified Portfolio arithmetics details

Ticker mu expected_return variance Sigmas(volatilities) modifiy shape(Er)/𝝈 initial price Company Sector Industry
0 IGM 0.000747 0.000307 0.017516 0.042633 36.212482 IGM Financial Inc. Financial Services Asset Management
1 CNQ 0.000925 0.000943 0.030710 0.030125 10.011539 Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production
2 DOL 0.000257 0.000147 0.012131 0.021183 38.585945 Dollarama Inc. Consumer Defensive Retail Defensive
3 DOO 0.000143 0.000144 0.012008 0.011938 36.010262 BRP Inc. Consumer Cyclical Vehicles & Parts
4 BN 0.000468 0.000511 0.022599 0.020709 27.440422 Brookfield Corporation Financial Services Asset Management
5 PEY 0.000306 0.000216 0.014697 0.020790 14.713406 Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production
6 ENB 0.000373 0.000308 0.017539 0.021273 25.500664 Enbridge Inc. Energy Oil & Gas Storage/Transport
7 BMO 0.000303 0.000350 0.018704 0.016185 58.515537 Bank of Montreal Financial Services Banks
8 TD 0.000243 0.000282 0.016793 0.014462 45.858364 Toronto-Dominion Bank Financial Services Banks
9 NGD 0.000737 0.001866 0.043194 0.017052 1.230000 New Gold Inc. Basic Materials Metals & Mining
10 TVE 0.000003 0.000047 0.006884 0.000416 22.429239 Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production

4. Asset Pricing, Profit & Loss simation and Risk Projection¶

In this section, we will focus on :¶
  • ##### Monte Carlo simulation of stock price using cov matrix and cholesky decomposition
  • ##### Profit & Lost simulation
  • ##### VaR/CVaR calculation under current macroeconomic factors

Before going forward with our analysis, it is crucial to understand the form of the data distribution, here stock price and asset returns distribution over time in other to choose the appropriate model. The daily adjust closed prices charts above show that the stock prices movement and their returns over time folow an independent random process, with stock prices always positive and ncrease indefinitely. The distribution exhibits positive skewnes.The future price movement doesn't depend on its history, but it is determined by both its current state and some inherent randomness such as economic indicators, company performance, investor sentiment, geopolitical events, and unforeseen news. Therefore, the stock prices are considered stochastic. In general we don’t know the distribution of the stock prices, we only know that it is closed to the brownan motion stockastic process. Typically, the logarithm of the stock price follows a Brownian motion with drift.

Stochastic differential equation (SDE) of the stock price 𝑆(𝑡): $$ dS(t) = \mu S(t) \, d(t) + \sigma S(t) \, dW(t) $$ where:

  • $dS(t)$ is the variation (absolute change) of the stock price over the time interval d(t)
  • $\mu$ is the drift term (expected return),
  • $\sigma$ is the volatility of the stock,
  • $W(t)$ is a Wiener process (Geometric Brownian motion).

In a simple term, $dS(t) = S(t+d(t))$ , with $S(t)$ representing the stock price at the time $t$. logarithmicly the continuous compounding return of the stock over the interval $d(t)$ is $ r(t) = \log\left(\frac{S(t+d(t))}{S(t)}\right)$.
The volatility $\sigma$ is the square root of the variance. It provides a measure of the risk or uncertainty of the stock price. $\sigma$ is a key parameter of the Geometric Brownian motion that determines the stochastic variation of the stock price.
The variance of the stock price over a given period is a measure of the magnitude of expected price fluctuations. It is the mathematical expectation of the squared deviation between the price and its mean.
$W(t)$ is a random variable that follows a Wiener process. It is the random component of the stock price movement and is related to the variation of time d(t). The mathematical expression of this relationship is : $dW(t)=\epsilon \sqrt{dt}$. In this expression, the term "epsilon" represents a random variable whose distribution is normal with expected value of zero(mean zero) and variance equal 1. It's mathematical expection is $E(dW(t))=\sqrt{dt}E(epsilon)=0$ with the variance $Var(dW(t))=d(t)Var(epsilon)=d(t)$.
The variations of $W(t)$ are independent over time. In the case the company associated with that stock does not distribute a portion of its profits to shareholders in the form of dividend payments, the stochastic equation for the return of a stock is:$\frac{dS(t)}{S(t)} = \mu d(t) + \sigma dW(t)$. Therefore, it makes sense that the return on a stock does not depend on the price of the stock.

Let's now focus on the stochastic equation for the return. We will dig into this yow part : $\mu d(t) and \sigma dW(t)$.
The first part, $\mu d(t)$, is deterministic meaning that, using the historical data, we can calculate the expected change in the stock price over the small time interval $dt$, assuming no randomness. Essentially, $mu$ is the expected rate of return per unit time, and when multiplied by the stock price $S(t)$ and the time interval $dt$, it gives the expected change in the stock price due to predictable factors like steady growth, interest rates, or dividends.
Tthe second part, $\sigma dW(t)$, is the Stochastic or randomness part of the stock rate of return. It takes in to considaration the unpredictable fluctuations in the stock price due to various factors like market volatility, company-Specific news, or economic factors, geopolitical events,natural disasters and pandemic, investor behavior and sentiment, technological advances and disruptions, global economic interdependencies. The $dW(t)$ represents the random shock to the stock price, and $sigma$ scales this shock, making it more or less volatile. Let's look inside the solution of the stochatic equation of the stock price.

Quation1: $S(t) = S(0) \exp \left( \left(\mu - \frac{\sigma^2}{2}\right)t + \sigma W(t) \right)$

or in more details

Quation2: $S(t) = S(0) \exp \left( \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma \phi \sqrt{dt} \right)$

In the equation2, the term $ \phi$ represent correlated normal distributions with standard deviation equal 1 and expected value of zero(mean zero). But from Geometric Brownian Motion prostective, the stock price movement is independent over time(uncorrelated) and follow a log-normal distribution with a mean of zero and a standard deviation that depends on the time interval dt. In order to come out of this situation, we will procide as follow:

  • Calculate log returns of the stock prices
  • Calculate the expected return, the variance and the volatility of each stock
  • Calculate the variance-covariance matrix
  • Calculate cholesky decomposition matrix.
  • Simulate an uncorrelated random normal distribution with $mu = 0 and sigma = 1$(Z distribution)
  • Apply the cholesky matrix to the uncorrelated random normal distribution(Z distribution) in order to get a correlated random normal normal distribution with with $mu = 0 and sigma = 1$.
  • Use correlated random normal normal distribution as input for the stock price function.

After then, we will simulate the portfolio Profit & Loss and finanly we will calculate the portfolio VaR(value at Rick and the CVaR(conditional Value at Risk)

Correlation - Covariance & Cholesky decomposition¶

  • Covariance: Covariance measures the degree to which two variables (e.g., asset returns) move together. It tells us whether the returns of two assets tend to rise and fall together (positive covariance) or move in opposite directions (negative covariance). Zero Covariance means that there is no linear relationship between the assets' returns. A mix of assets with low or negative covariances can reduce overall portfolio risk.
  • Mathematical Formula:
    $$ Cov(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y}) $$ Where:

    • $( X_i )$ and $( Y_i )$ are the returns of assets $(X)$ and $(Y)$.
    • $( \bar{X} )$ and $( \bar{Y} )$ are the mean returns of $(X)$ and $(Y)$.
    • \$( n )$ is the number of observations.
  • Correlation: Correlation is a normalized version of covariance, which measures the strength and direction of the linear relationship between two variables (asset returns). Unlike covariance, correlation is dimensionless and always ranges between -1 and 1. Correlation is used to measure the degree of diversification in a portfolio. Combining assets with low or negative correlations can significantly reduce portfolio risk. Portfolio managers use correlation to understand how different assets are likely to behave relative to one another under various market conditions.
  • Mathematical Formula: $$ Correlation(X, Y) = [ \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} ] $$ Where:

    • $Cov(X, Y)$ is the covariance of $( X )$ and $( Y )$.
    • $( \sigma_X )$ and $( \sigma_Y )$ are the standard deviations of $( X )$ and $( Y )$.
  • Cholesky decomposition: Cholesky decomposition is a mathematical technique used of decomposing a positive-definite matrix into the product of a lower triangular matrix and its transpose. We will use Cholesky decomposition methode to decompose the covariance matrix into the product of a lower triangular matrix and its transpose(cholesky Matrix). This will help us to generate correlated asset returns and the stock price.
  • For a positive-definite matrix $( A )$, the Cholesky decomposition is expressed as: $$ A = LL^\top $$ Where:
  • $( L )$ is a lower triangular matrix.
  • $( L^\top )$ is the transpose of $( L )$.
  • Suppose you have a covariance matrix of asset returns: $$ \Sigma = \begin{pmatrix} \sigma_{11} & \sigma_{12} & \dots & \sigma_{1n} \\ \sigma_{21} & \sigma_{22} & \dots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \dots & \sigma_{nn} \end{pmatrix} $$ The Cholesky decomposition would allow you to write:
$$\Sigma = LL^\top$$

Where:

  • \$( L )$ is used to generate correlated random variables from uncorrelated normal variables, which is essential for realistic financial simulations.

    The following is the process to predict the portfolio Value At RIsk(VaR and Conditional Valut At Risk(CVaR) using Monte Carlo Simulation

1. Uncorrelated Normal Distribution Simulation
    An uncorrelated normal distribution describes a situation where random variables follow a normal distribution and have no linear relationship, resulting in a correlation of zero."

    Why will we use uncorrelated normal distributions? As noted in the Exploratory Data Analysis, real-world stock prices move
    independently and are non-stationary, meaning they generally increase over time, making the distribution time-dependent. Unlike normal
    distributions, which are stationary, stock prices often show more extreme values, or 'fat tails,' indicating larger price changes than
    a normal distribution would predict. Thus, uncorrelated normal distributions are used as a baseline to simulate real-world stock price
    movements and to better undestand the tail risks. How to minimize the fat-tail risk?

2. Correlated Normal Distribution using Cholrsky Decomposition

    This problem can be solved by applying Cholesky Decomposition process to the asset covarience matrix and the uncorelated normal distributions of the asset log returns.
    Normal Distribution: A symmetrical, bell-shaped distribution centered around the mean (μ), characterized by its mean and standard deviation (σ)
    Uncorrelated Variables: Random variables with zero covariance, indicating no linear relationship. However, they are not necessarily independent unless normally distributed.

3. Daily Returns Simulation
     $Daily _returns = e^{\left( \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma \phi \sqrt{dt} \right)}$
    In this formula $ \phi$ represent correlated normal distributions scaled by $sqrt{dt}$

4. Future Stock Price Simulation

    $S(t) = S(0) \exp \left( \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma \phi \sqrt{dt} \right)$      $S(0)$c is the initial asset price

5. Portfolio Price Simulation

    P(t) is the sum of each asset's price multiplied by its weight. If there are 𝑛 assets, the portfolio price at time 𝑡 is calculated by adding up the weighted prices of all the assets.

    $( P(t) = \sum_{i=1}^{n} w_i S_i(t)$

    Where:
    $w_i$ is the weight of asset 𝑖 in the portfolio
    $S_i(t)$ is the simulated price of asset𝑖 at time𝑡.

6. Portflio Profit & Loss Simulation
    Tthe Portflio Profit & Loss Simulation difference between the current asset prices and their initial prices, multiplied by their respective weights, to get the overall portfolio P&L.

    $\text{P&L}(t) = \sum_{i=1}^{n} w_i \times S_i(t) - S_i(0)$     Where:

    $\text{P&L}(t)$ is the profit or loss of the portfolio at time
    $wi$ is the weight of asset 𝑖 in the portfolio.
    𝑆 (t) is the price of asset 𝑖 at time 𝑡.
    𝑆𝑖(0) is the initial price of asset 𝑖 (at time 0).
    𝑛 is the number of assets in the portfolio

7. VaR and CVaR calculation

  •     VaR representing the maximum loss at a given confidence level
        $\text{VaR}_\alpha = - \inf \{ x \in \mathbb{R} : F(x) > \alpha \}$
        Where:
        $\alpha$ is the confidence level (e.g., 0.95 or 0.99).
        $F(x)$ is the cumulative distribution function of portfolio losses.

  •     CVaR providing the average of the losses exceeding VaR.
        $\text{CVaR}_\alpha = \mathbb{E}[ X | X \leq \text{VaR}_\alpha ]$
        Where:
        $\mathbb{E}[ X | X \leq \text{VaR}_\alpha ]$is the expected loss given that the loss $𝐿$ exceeds the $VaR$ at confidence level $α$.

In [12]:
def plotting_heatmap_for_correlation_matrix(log_returns, title):
    plt.figure(figsize=(20, 8))
    #sns.heatmap(log_returns.corr(), annot=True)
    sns.heatmap(log_returns.corr(), annot=True, cmap='coolwarm', vmin=-1, vmax=1, linewidths=0.5, fmt=".2f")
    plt.yticks(rotation=360)
    plt.title(title, pad= 20)

def plotting_heatmap_for_covariance_matrix(covariance_matrix, title):
    plt.figure(figsize=(20, 8))
    sns.heatmap(covariance_matrix, annot=True, cmap='coolwarm', fmt=".5f",
            linewidths=0.5, vmin=covariance_matrix.min().min(), vmax=covariance_matrix.max().max())
    plt.yticks(rotation=360)
    plt.title(title, pad= 20)

    
def variance_covariance_matrix(log_returns):
    return log_returns.cov() 


#-----------------------------------------------------------------------------------------------------------------------------------------------
#create cholesky matrice: let's apply cholesky decomposition to the covarience matrix 
# Input: covarience matrice
# output: cholesky matrice data frame
#------------------------------------------------------------------------------------------------------------------------------------------------
def create_cholesky_matrix(covar_mat):
    cholesky_matrix_data = np.linalg.cholesky(covar_mat)
    return pd.DataFrame(cholesky_matrix_data[0:,0:], columns=covar_mat.columns.tolist(), index=covar_mat.columns.tolist())



#--------------------------------------------------------------------------------------------------------
# here let's simulate 10000 uncorelated normal distribution  iterations to calculate the stock price.
#input:covariance matrice and number of iteration
#output 10000 Z score for each stock price: uncorelated normal z core array and  it's  data frame 
# here we simulate 10000 uncorelated normal distribution  iterations to calculate the stock price.
#t_intervals = 250
#number_of_assets = len(covar_mat.columns.tolist())
#Z = norm.ppf(np.random.rand(iterations,number_of_assets ))
#--------------------------------------------------------------------------------------------------------
def simulate_uncorelated_normal_distribution(covar_mat,iterations):
    number_of_assets = len(covar_mat.columns.tolist())
    #z score array
    Z = norm.ppf(np.random.rand(iterations,number_of_assets ))
    Z_df = pd.DataFrame(data=Z[0:,0:],index=[i for i in range(Z.shape[0])], columns=covar_mat.columns.tolist())
    return Z,Z_df

#-----------------------------------------------------------------------------------------------------
#Description: generate correlated normal distribution using transposed cholesky matrix and uncorelated 
#normal Z score distribution
#=MMULT(unCorrelated_normal_distribution,TRANSPOSE(cholesky_matrix))
#------------------------------------------------------------------------------------------------------
def generate_correlated_normal_distribution(cholesky_matrix_data_df,Z):
    Correlated_Normals_Z = np.matmul(Z, cholesky_matrix_data_df.T)
    Correlated_Normals_Z_arr = np.array(Correlated_Normals_Z)
    return Correlated_Normals_Z, Correlated_Normals_Z_arr

#--------------------------------------------------------------------------------------------------------------
#Description: Daily returns simulation (returns simulation = 𝒆^(((𝝁𝒊−(𝟏/𝟐)𝝈𝒊𝟐)(𝒕𝟐−𝒕𝟏)+𝝈𝒊√((𝒕𝟐−𝒕𝟏) ) 𝝓)))
#Inputs:
#   𝝓 : Correlated_Normals_Z
#   𝝓_arr : Correlated_Normals_Z_array
#   𝝁 : log_returns.mean()
#   variance: log_returns.var()
#   𝝈 : log_returns.std()
#   𝓢1 : initial_prices
#   delta_t = 1
#output: 
 #--------------------------------------------------------------------------------------------------------------   
def simulate_daily_returns(𝝓,𝝓_arr, 𝝁,𝝈,delta_t):
    daily_returns_list_df = np.zeros_like(𝝓_arr)
    daily_returns_list_df = np.exp((𝝁 - 0.5 * 𝝈** 2) * delta_t + 𝝈* delta_t ** 0.5 *𝝓)
    return daily_returns_list_df

#--------------------------------------------------------------------------------------------------------------------------
# Description: Stock price simulation
def stock_prices_simulation(initial_prices,daily_returns_list_df ):
    𝓢1_list = []
    𝓢1_list = initial_prices.values
    expo_r = daily_returns_list_df
    future_stock_price_list_df= pd.DataFrame(data=daily_returns_list_df[0:0:],
                                             index=[i for i in range(daily_returns_list_df.shape[0])],
                                             columns=expo_r.columns.tolist())
    for (index, column) in enumerate(expo_r):
        future_stock_price_list_df[column] = pd.DataFrame(data=𝓢1_list[index]*expo_r[column].values)
    return future_stock_price_list_df
  

    #Initial portfolio price
def calculate_initial_portfolio_price(𝓢1):
    return 𝓢1.values.sum()

#Portfolio price simulation
def simulated_portfolio_price(future_stock_price_df):
    simulated_portfolio_price_row_sum = []
    for i in range(len(future_stock_price_df)):
        simulated_portfolio_price_row_sum.append(future_stock_price_df.iloc[i].sum())
    simulated_portfolio_price_df = pd.DataFrame(data = simulated_portfolio_price_row_sum, columns=['portfolio_prices'])
    return simulated_portfolio_price_df

#-----------------------------------------------------------------------------------------------------------
#Description: Portfolio Profit and loss calculation
# input:simulated_portfolio_price_df,portfolio_initial_price
# output :portfolio_profit_and_loss_df
#-----------------------------------------------------------------------------------------------------------
def calculate_prtfolio_profit_and_loss(simulated_portfolio_price_df, portfolio_initial_price):
    portfolio_profit_and_loss_df = simulated_portfolio_price_df - portfolio_initial_price
    portfolio_profit_and_loss_df.columns = ['profit_&_lost']
    return portfolio_profit_and_loss_df

def set_portfolio_price_profit_and_Loss_simulation_df(simulated_portfolio_price_df, portfolio_profit_and_loss_df):
    return  pd.DataFrame({'simulated_portfolio_price': simulated_portfolio_price_df['portfolio_prices'].values,
                          'Simulated Portfolio Profit & Lost': portfolio_profit_and_loss_df['profit_&_lost'].values})


#-------------------------------------------------------------------------------------------------------------------------
# Description:sorting profit and loss ascendante; confifence level rank; Var calculation;CVar calculation
# Input: 
# Output:
#-------------------------------------------------------------------------------------------------------------------------

def calculate_portfolio_Var_and_CVar(portfolio_profit_and_loss_df, confidence_level):
    #sorting profit and loss ascendante
    lportfolio_profit_and_loss_df = portfolio_profit_and_loss_df.sort_values(by='profit_&_lost', ascending=True)
    lportfolio_profit_and_loss_df = portfolio_profit_and_loss_df.reset_index(drop=True)
    #confifence level rank ( 95% confidence lavel)
    rank = int((1-confidence_level)*len(lportfolio_profit_and_loss_df))-1
    #Var calculation
    VaR = portfolio_profit_and_loss_df.iloc[rank]['profit_&_lost']
    #CVar calculation
    port_folio_lost_beyond_VaR = portfolio_profit_and_loss_df[:rank]
    CVaR = np.average(port_folio_lost_beyond_VaR)
    return VaR, CVaR
 
#--------------------------------------------------------------------------------------------------------------------------
#Decription: Profit and loss summary statistics. Minimum lost, maximum lost, mean(moderate lost)lost standart deviation, 
#            Value at risk(Var),Conditional value-at-risk (CVaR)
#Input :portfolio_profit_and_loss_df,VaR,CVaR
#Output:
#--------------------------------------------------------------------------------------------------------------------------
def profit_and_loss_summary_statistics(portfolio_profit_and_loss_df,VaR,CVaR): 
    VaR_and_CVaR_df = pd.DataFrame([{'VaR':VaR, 'CVaR':CVaR}]).transpose()
    VaR_and_CVaR_df = VaR_and_CVaR_df.rename(columns={0:'profit_&_lost'})
    portfolio_profit_and_loss_stat_df =  portfolio_profit_and_loss_df.agg(['min', 'max', 'mean', 'std'])
    return pd.concat([portfolio_profit_and_loss_stat_df,VaR_and_CVaR_df], ignore_index=False)

# Plot a histogram
def profit_lost_summary(p_portfolio_profit_and_loss_df, p_portfolio_profit_and_loss_time_horizons_df):   
    fig, ax = plt.subplots(1,2, figsize=(20, 8))
    p_portfolio_profit_and_loss_df.plot.kde(ax=ax[0], legend=True, title='Distribution: Profit & Lost')
    p_portfolio_profit_and_loss_df.plot.hist(density=True, ax=ax[0])
    ax[0].set_ylabel('Probability')
    ax[0].grid(axis='y')
    ax[0].set_facecolor('#d8dcd6')
    
    bars = p_portfolio_profit_and_loss_time_horizons_df.plot(kind='bar', ax=ax[1], colormap='viridis', alpha=0.7, width=2, title='Profit & Lost Time Horizons')

    # Customize the plot
    #ax.set_title('Marks by Period for Each Student')
    ax[1].set_xlabel('Time Horizon')
    ax[1].set_ylabel('Value')
    ax[1].grid( axis='y')
    ax[1].spines['top'].set_visible(True)
    ax[1].spines['right'].set_visible(True)
    ax[1].spines['bottom'].set_visible(True)
    ax[1].spines['left'].set_visible(True)
    ax[1].axes.get_yaxis().set_visible(False)
    
    
    # Display bar values on top of each bar
    for bar in bars.patches:
        height = bar.get_height()
        ax[1].text(
        bar.get_x() + bar.get_width() / 2,  # x-coordinate
        height - 0.3,                            # y-coordinate
        f'{height}',                       # text
        ha='center',                       # horizontal alignment
        va='bottom'                        # vertical alignment
         )

    ax[1].legend(title='statistics', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.subplots_adjust(wspace=0.05)   
    plt.show()
    
    
def portfolio_profit_and_loss_time_horizons_df(p_portfolio_daily_profit_and_loss_df):
    
    time_horizons = {
    'Daily': 1,
    'Weekly': 5,
    'Biweekly': 10,
    'Monthly': 21,
    'Quarterly': 63,
    'Annual': 252
    }
    portfolio_profit_and_loss_time_horizons_list = []
    portfolio_profit_and_loss_time_horizons_df =pd.DataFrame()
    for horizon, days in time_horizons.items():
        portfolio_profit_and_loss_time_horizons_list = []
        for value in p_portfolio_daily_profit_and_loss_df['profit_&_lost']:
            scaling_factor = np.sqrt(days)
            portfolio_profit_and_loss_time_horizons_list.append(round(value * scaling_factor,1))
        portfolio_profit_and_loss_time_horizons_df[[ horizon + 'Profit & Lost']] =pd.DataFrame({ horizon + 'Profit & Lost': portfolio_profit_and_loss_time_horizons_list})
    portfolio_profit_and_loss_time_horizons_df.index = p_portfolio_daily_profit_and_loss_df.index 
    portfolio_profit_and_loss_time_horizons_df = portfolio_profit_and_loss_time_horizons_df.rename(index={'Statistics': portfolio_profit_and_loss_time_horizons_df.index})
    return portfolio_profit_and_loss_time_horizons_df
                                                   

def summary_statistics_graph_and_table(portfolio_profit_and_loss_df,portfolio_profit_and_loss_time_horizons_df):
    profit_lost_summary(portfolio_profit_and_loss_df,portfolio_profit_and_loss_time_horizons_df)      
    profit_and_loss_summary_statistics_df = profit_and_loss_summary_statistics(portfolio_profit_and_loss_df,VaR,CVaR)
    display(portfolio_profit_and_loss_time_horizons_df.T)

    
#-------------------------------------------------------Data Setting----------------------------------------------------------
covar_mat = variance_covariance_matrix(most_diversify_portfolio_assets_log_returns_df) 
cholesky_matrix_data_df = create_cholesky_matrix(covar_mat)

#daily_returns_list_df = simulate_daily_returns(Correlated_Normals_Z,Correlated_Normals_Z_arr, log_returns.mean(),log_returns.std(),1)
Z, Z_df = simulate_uncorelated_normal_distribution(covar_mat,10000)
Correlated_Normals_Z, Correlated_Normals_Z_arr= generate_correlated_normal_distribution(cholesky_matrix_data_df,Z)

daily_returns_df = simulate_daily_returns(Correlated_Normals_Z,Correlated_Normals_Z_arr, 
                                               most_diversify_portfolio_assets_log_returns_df.mean(),
                                               most_diversify_portfolio_assets_log_returns_df.std(),1)


future_stock_price_df = stock_prices_simulation(most_diversify_portfolio_assets_initial_prices,daily_returns_df) 

simulated_portfolio_price_df= simulated_portfolio_price(future_stock_price_df)
initial_portfolio_prices = calculate_initial_portfolio_price(most_diversify_portfolio_assets_initial_prices)
portfolio_profit_and_loss_df = calculate_prtfolio_profit_and_loss(simulated_portfolio_price_df, initial_portfolio_prices)

simulated_portfolio_price_profit_and_Loss_df = set_portfolio_price_profit_and_Loss_simulation_df(simulated_portfolio_price_df, 
                                                                                                portfolio_profit_and_loss_df)
VaR, CVaR = calculate_portfolio_Var_and_CVar(portfolio_profit_and_loss_df, 0.95)
profit_and_loss_summary_statistics_df = profit_and_loss_summary_statistics(portfolio_profit_and_loss_df,VaR,CVaR)
portfolio_profit_and_loss_time_horizons_df = portfolio_profit_and_loss_time_horizons_df(profit_and_loss_summary_statistics_df)


#----------------------------------------------------------Data Printing ----------------------------------------------------
plotting_heatmap_for_correlation_matrix(most_diversify_portfolio_assets_log_returns_df, 
                                        'Correlation Matrix of the Most Diversified portfolio Asset Log Returns')
plotting_heatmap_for_covariance_matrix(covar_mat, 'Covariance Matrix of the Most Diversified portfolio  Asset Log Returns')
plotting_heatmap_for_covariance_matrix(cholesky_matrix_data_df, 'Cholesky Matrix of the Most Diversified portfolio Asset Log Returns')
print('\n Uncorrelated normal Z simulation\n')
display(Z_df)
print('\nCorrelated Normal Z distribution\n')
display(Correlated_Normals_Z)
print('\nDaily returns simulation\n')
display(daily_returns_df)
print('\nFuture stock price simulation\n')
display(future_stock_price_df)
print('initial_portfolio_prices')
display(initial_portfolio_prices)
print('\nSimulated Portfolio Prices - Profit & Lost')
display(simulated_portfolio_price_profit_and_Loss_df)
summary_statistics_graph_and_table(portfolio_profit_and_loss_df, portfolio_profit_and_loss_time_horizons_df.T)
 Uncorrelated normal Z simulation

IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
0 -1.461680 0.731590 0.810047 -0.082086 -0.259016 0.189051 -1.773228 -1.266670 1.144792 -1.618592 0.012877
1 0.483978 0.423525 -1.770381 0.410302 -0.028886 -1.434581 -1.187604 -1.127606 -0.407771 -0.863672 -0.848371
2 0.657793 0.753622 1.538440 0.515229 -0.321961 0.188501 1.639347 -0.237593 -0.250557 -0.349359 0.079098
3 0.617868 -0.821943 -0.218321 0.068942 0.690773 1.223608 -0.539007 -1.613701 -0.804739 0.314620 -0.901479
4 -0.722329 -0.386383 -0.793887 -0.235237 -1.285817 -0.405252 0.953291 3.262097 0.213284 2.239812 0.298679
... ... ... ... ... ... ... ... ... ... ... ...
9995 -2.035752 -0.793953 -0.097404 -0.392294 1.645817 1.182710 -0.418856 -0.823955 -2.208061 0.032592 0.292743
9996 -0.774168 -1.673295 -1.137698 -1.072200 1.338937 -1.007644 1.648649 1.007604 -0.101299 -2.052050 -0.112984
9997 1.294729 -1.421436 -1.756090 0.806766 -1.605433 0.943934 0.268175 0.009367 0.997463 0.141327 -0.466869
9998 -0.599911 -1.245707 0.178666 -1.053268 -1.088356 -0.291844 0.258986 -1.381982 0.852582 -1.609754 -0.150586
9999 1.036581 -1.073592 0.306125 -1.003137 1.047839 -0.112428 -0.194010 0.632098 0.484087 0.362072 -1.113140

10000 rows × 11 columns

Correlated Normal Z distribution

IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
0 -0.025602 0.002590 -0.003664 -0.003429 -0.015230 -0.001756 -0.019467 -0.020175 -0.002820 -0.070456 -0.002241
1 0.008477 0.017874 -0.006296 -0.005135 -0.001256 -0.015184 -0.014876 -0.018673 -0.017608 -0.036767 -0.005106
2 0.011522 0.029301 0.020064 0.020910 0.021292 0.020065 0.036729 0.024130 0.021542 0.005931 0.002314
3 0.010822 -0.015519 0.000054 0.000101 0.011766 0.011117 -0.005548 -0.013428 -0.010368 0.019298 -0.004143
4 -0.012652 -0.019763 -0.013683 -0.013852 -0.036834 -0.020416 -0.008089 0.010910 -0.002854 0.071850 -0.000295
... ... ... ... ... ... ... ... ... ... ... ...
9995 -0.035657 -0.047399 -0.022118 -0.022174 -0.014352 -0.007107 -0.024033 -0.028717 -0.037791 -0.019578 0.001180
9996 -0.013560 -0.056609 -0.022505 -0.024839 -0.013373 -0.028983 -0.010824 -0.014633 -0.019534 -0.112615 -0.003070
9997 0.022678 -0.024054 -0.007648 -0.005492 -0.023331 -0.004314 -0.011070 -0.013816 -0.004336 0.005176 -0.002689
9998 -0.010508 -0.042433 -0.009732 -0.012344 -0.032846 -0.018938 -0.017168 -0.034929 -0.017782 -0.074595 -0.003511
9999 0.018156 -0.017445 0.006240 0.002964 0.023460 0.004008 0.000067 0.012019 0.011458 0.017084 -0.007012

10000 rows × 11 columns

Daily returns simulation

IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
0 1.000145 1.000533 1.000139 1.000030 0.999868 1.000172 0.999878 0.999750 1.000055 0.996766 0.999964
1 1.000742 1.001003 1.000107 1.000010 1.000184 0.999974 0.999958 0.999779 0.999806 0.998217 0.999944
2 1.000795 1.001354 1.000427 1.000322 1.000694 1.000493 1.000864 1.000579 1.000464 1.000060 0.999995
3 1.000783 0.999977 1.000184 1.000072 1.000479 1.000361 1.000122 0.999877 0.999928 1.000637 0.999951
4 1.000372 0.999847 1.000017 0.999905 0.999380 0.999897 1.000077 1.000332 1.000054 1.002911 0.999977
... ... ... ... ... ... ... ... ... ... ... ...
9995 0.999969 0.998998 0.999915 0.999805 0.999888 1.000093 0.999798 0.999591 0.999467 0.998959 0.999987
9996 1.000356 0.998716 0.999910 0.999773 0.999910 0.999772 1.000029 0.999854 0.999774 0.994952 0.999958
9997 1.000991 0.999715 1.000091 1.000005 0.999685 1.000134 1.000025 0.999869 1.000029 1.000027 0.999961
9998 1.000409 0.999151 1.000065 0.999923 0.999470 0.999919 0.999918 0.999475 0.999803 0.996588 0.999955
9999 1.000912 0.999918 1.000259 1.000107 1.000743 1.000256 1.000220 1.000353 1.000294 1.000542 0.999931

10000 rows × 11 columns

Future stock price simulation

IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
0 36.217730 10.016877 38.591307 36.011345 27.436813 14.715933 25.497549 58.500936 45.860863 1.226022 22.428426
1 36.239356 10.021580 38.590074 36.010607 27.445479 14.713029 25.499603 58.502580 45.849476 1.227807 22.427984
2 36.241288 10.025098 38.602416 36.021871 27.459467 14.720653 25.522693 58.549436 45.879630 1.230074 22.429129
3 36.240844 10.011308 38.593047 36.012871 27.453556 14.718717 25.503775 58.508320 45.855051 1.230784 22.428132
4 36.225946 10.010003 38.586617 36.006838 27.423421 14.711897 25.502638 58.534960 45.860838 1.233581 22.428726
... ... ... ... ... ... ... ... ... ... ... ...
9995 36.211352 10.001511 38.582669 36.003240 27.437357 14.714775 25.495508 58.491590 45.833938 1.228719 22.428954
9996 36.225370 9.998683 38.582487 36.002087 27.437964 14.710045 25.501415 58.507001 45.847993 1.223791 22.428298
9997 36.248371 10.008684 38.589442 36.010453 27.431790 14.715379 25.501305 58.507895 45.859696 1.230034 22.428357
9998 36.227307 10.003037 38.588466 36.007490 27.425892 14.712217 25.498577 58.484794 45.849342 1.225803 22.428230
9999 36.245500 10.010716 38.595943 36.014109 27.460812 14.717180 25.506287 58.536174 45.871861 1.230666 22.427689

10000 rows × 11 columns

initial_portfolio_prices
316.50785970687866
Simulated Portfolio Prices - Profit & Lost
simulated_portfolio_price Simulated Portfolio Profit & Lost
0 316.503801 -0.004059
1 316.527573 0.019713
2 316.681756 0.173896
3 316.556406 0.048546
4 316.525465 0.017606
... ... ...
9995 316.429613 -0.078246
9996 316.465135 -0.042725
9997 316.531405 0.023545
9998 316.451154 -0.056706
9999 316.616938 0.109079

10000 rows × 2 columns

DailyProfit & Lost WeeklyProfit & Lost BiweeklyProfit & Lost MonthlyProfit & Lost QuarterlyProfit & Lost AnnualProfit & Lost
min -0.2 -0.5 -0.7 -1.1 -1.9 -3.7
max 0.5 1.0 1.4 2.1 3.6 7.2
mean 0.1 0.1 0.2 0.3 0.5 1.0
std 0.1 0.2 0.2 0.4 0.6 1.2
VaR 0.0 0.1 0.1 0.2 0.4 0.7
CVaR 0.1 0.1 0.2 0.3 0.5 1.0

A Portfolio Optimization Model¶

Portfolio optimization involves determining the most effective asset allocation to achieve specific investment objectives, typically aiming to maximize returns while minimizing risk. In this section, we will calculate the portefolio expected return and volatility. We will then use Monte Carlo method to simulate 10000 trials the portefolio expected return and volatility. We will generate 10000 random portfolios with different asst allocations. we will plot these portfolios on a risk-return graph to create the Random Efficient Frontier. Our aim here is to select the optimal portfolios. Portfolios with the highest expected return and minimal risk. In order to achieve this, will use machine learning techniques to approximate a 2 degree polynomial function that perfectly fit the uper-bound of the random efficient frontier plot. Finally, we will combined K-means clustering technique with efficient frontier modeling to dig into the random generated portfolios and predict different type of investment risk tolerance(very conservative, conservative, moderate, aggressive and very aggressive)

Portfolio Expected Return and Volatility Simulation - Random Efficient Frontier¶

In [13]:
def portfolio_random_weight_array_df(assets_returns_df):
    #random portfolio weigh simulation
    number_of_assets = len(assets_returns_df.columns.tolist())
    random_array = np.random.rand(1,number_of_assets )
    random_array_df = pd.DataFrame(random_array, columns = assets_returns_df.columns.tolist())
    random_weight_df = random_array_df/random_array_df.values.sum()
    return random_weight_df
    
def portfolio_expected_Return(random_weight_df,log_returns):
    assets_expected_returns = log_returns.mean()
    weited_expected_returns = assets_expected_returns * random_weight_df
    portfolio_expected_return_ = weited_expected_returns.values.sum()
    return 100*portfolio_expected_return_

def  portfolio_volatility(varcovar,w):
    transpose_w = w.T
    σp = np.sqrt(np.matmul(np.matmul(w,varcovar),w.T))
    return 100*σp[0][0]
In [14]:
def efficient_frontiere_plot(portfolio_trails_simulation_df):
    display(portfolio_trails_simulation_df)
    #fig, ax = plt.subplots()
    portfolio_trails_simulation_df.plot(x='σp', y='E_rp', kind='scatter', figsize=(10, 6));
    plt.xlabel('Expected Volatility')
    plt.ylabel('Expected Return')
    plt.title('Random portfolios Efficient Frontier')
    
#efficient_frontiere_plot(portfolio_trails_simulation_df)
In [15]:
def generate_excess_return(log_returns_df):
    𝝁 = log_returns_df.mean()
    𝝁_list = []
    𝝁_list = 𝝁.values
    X_df= pd.DataFrame(data=log_returns_df[0:0:],
                       index=log_returns_df.index.to_list(), #[i for i in range(log_returns_df.shape[0])],
                       columns=log_returns_df.columns.tolist())
    
    assets_list = log_returns_df.columns.tolist()
    for index in range(len(assets_list)):   
            X_df[assets_list[index]] =  log_returns_df[assets_list[index]].values - 𝝁_list[index]
    return X_df

#Portfolio Statistics
def portfolio_arihtmetics(log_returns_df,index_adj_close_price_df):
    return pd.DataFrame({'mu expected_return':log_returns_df.mean(),
                        'variance':log_returns_df.var(),
                        'Sigmas(volatilities)':log_returns_df.std(),
                        'modifiy shape(Er)/𝝈':log_returns_df.mean()/log_returns_df.std(),
                        'initial price':index_adj_close_price_df.iloc[0]}).transpose()


def excess_return_varcovar(X_df): 
    return X_df.cov()

def get_uncorrelated_assets_index_adj_close_price_df(index_adj_close_price_df, uncorrelated_assets_list):
    return index_adj_close_price_df[uncorrelated_assets_list]
In [16]:
def uncorelated_portfolio_trails_simulation(log_returns, most_diversify_portfolio_assets_list, trial):
    
    σp_list = []
    E_rp_list = []
    random_weight_array_df_rows_list = []
    excess_return_df = generate_excess_return(log_returns[most_diversify_portfolio_assets_list])
    
    for i in  range(0, trial):         
        #random_weight_array_df = portfolio_random_weight_array_df(uncorrelated_assets_returns_log_returns_df(log_returns, 
        #                                                                                most_diversify_portfolio_assets_list))
        random_weight_array_df = portfolio_random_weight_array_df(log_returns[most_diversify_portfolio_assets_list])
        
        random_weight_array_df_rows_list.append(random_weight_array_df)
        
        E_rp_list.append(portfolio_expected_Return(random_weight_array_df,log_returns[most_diversify_portfolio_assets_list]))
        
        σp_list.append(portfolio_volatility(excess_return_varcovar(excess_return_df),random_weight_array_df))
    
    uncorelated_portfolio_trails_simulation_df =  pd.DataFrame({'σp':σp_list,'E_rp':E_rp_list}, index=[i for i in range(0,trial)])
    σp = uncorelated_portfolio_trails_simulation_df['σp']
    E_rp = uncorelated_portfolio_trails_simulation_df['E_rp']
    sharpes_rat = E_rp/σp
    uncorelated_portfolio_trails_simulation_sharpes_ratio_df = pd.DataFrame({'σp':σp,'E_rp':E_rp,'sharpes_ratio':sharpes_rat})
    
    random_weight_array_all_rows_df = pd.concat(random_weight_array_df_rows_list, axis=0,ignore_index=True)
    uncorrelated_weighted_portfolio_trails_simulation_df = uncorelated_portfolio_trails_simulation_sharpes_ratio_df.merge(random_weight_array_all_rows_df, 
                                                                                                                         left_index=True, right_index=True)
    
    return uncorelated_portfolio_trails_simulation_df,uncorelated_portfolio_trails_simulation_sharpes_ratio_df, \
                                        random_weight_array_all_rows_df,uncorrelated_weighted_portfolio_trails_simulation_df
In [17]:
uncorelated_portfolio_trails_simulation_df,uncorelated_portfolio_trails_simulation_sharpes_ratio_df, random_weight_array_all_rows_df, \
                                   uncorrelated_weighted_portfolio_trails_simulation_df = \
                                   uncorelated_portfolio_trails_simulation(log_returns, most_diversify_portfolio_assets_list, 10000)

  
X_df =generate_excess_return(most_diversify_portfolio_assets_log_returns_df)
Excess_return_varcovar = excess_return_varcovar(X_df)
display(Excess_return_varcovar)
efficient_frontiere_plot(uncorrelated_weighted_portfolio_trails_simulation_df)
IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
IGM 0.000307 0.000216 0.000153 0.000145 0.000263 0.000138 0.000150 0.000183 0.000158 0.000167 0.000019
CNQ 0.000216 0.000943 0.000237 0.000228 0.000382 0.000270 0.000387 0.000384 0.000332 0.000240 0.000018
DOL 0.000153 0.000237 0.000147 0.000141 0.000209 0.000138 0.000155 0.000177 0.000159 0.000143 0.000012
DOO 0.000145 0.000228 0.000141 0.000144 0.000204 0.000137 0.000149 0.000172 0.000154 0.000142 0.000014
BN 0.000263 0.000382 0.000209 0.000204 0.000511 0.000244 0.000263 0.000319 0.000278 0.000197 0.000027
PEY 0.000138 0.000270 0.000138 0.000137 0.000244 0.000216 0.000180 0.000207 0.000191 0.000141 0.000014
ENB 0.000150 0.000387 0.000155 0.000149 0.000263 0.000180 0.000308 0.000247 0.000219 0.000152 0.000015
BMO 0.000183 0.000384 0.000177 0.000172 0.000319 0.000207 0.000247 0.000350 0.000270 0.000149 0.000014
TD 0.000158 0.000332 0.000159 0.000154 0.000278 0.000191 0.000219 0.000270 0.000282 0.000139 0.000011
NGD 0.000167 0.000240 0.000143 0.000142 0.000197 0.000141 0.000152 0.000149 0.000139 0.001866 0.000029
TVE 0.000019 0.000018 0.000012 0.000014 0.000027 0.000014 0.000015 0.000014 0.000011 0.000029 0.000047
σp E_rp sharpes_ratio IGM CNQ DOL DOO BN PEY ENB BMO TD NGD TVE
0 1.591041 0.047869 0.030087 0.103417 0.154074 0.045688 0.100111 0.069420 0.068036 0.033478 0.130465 0.129491 0.130359 0.035460
1 1.461517 0.047157 0.032266 0.284562 0.020576 0.011801 0.095588 0.057754 0.032894 0.117483 0.157138 0.105801 0.093760 0.022643
2 1.867592 0.054864 0.029377 0.133012 0.128207 0.021695 0.044347 0.025555 0.094248 0.004054 0.057632 0.174658 0.294028 0.022565
3 1.423179 0.042317 0.029734 0.152672 0.025634 0.025288 0.045849 0.041366 0.076287 0.155829 0.118562 0.107588 0.148332 0.102592
4 1.282241 0.038346 0.029906 0.141923 0.040289 0.110578 0.118217 0.135731 0.027192 0.140439 0.012431 0.043684 0.075764 0.153753
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 1.279847 0.037943 0.029647 0.183991 0.012637 0.167984 0.118589 0.066875 0.106072 0.015518 0.017766 0.134697 0.084626 0.091246
9996 1.345417 0.038060 0.028289 0.153573 0.003607 0.083775 0.096908 0.012720 0.101108 0.018303 0.113543 0.076000 0.176751 0.163713
9997 1.636396 0.045658 0.027902 0.008815 0.129299 0.082767 0.010676 0.147832 0.106178 0.121826 0.081679 0.133596 0.140138 0.037195
9998 1.413588 0.032336 0.022875 0.017752 0.055286 0.053208 0.223711 0.151653 0.092482 0.189462 0.026228 0.133031 0.003929 0.053257
9999 1.380792 0.041466 0.030030 0.118761 0.050469 0.127756 0.097175 0.086228 0.069522 0.121345 0.000650 0.005177 0.168017 0.154899

10000 rows × 14 columns

In [18]:
#--------------------------------------------------Efficient Frontiere Optimal Points-----------------------------------------------
# get data frame top1 portfolio
# selecting the optimal portfolios:portfolios with expected return higher or equal the minimun risky portfolio
#sort the optimal portfolio data frame by selected value:ascending=True
#return the data frame
#---------------------------------------------------------------------------------------------------------------------------------
def efficient_frontiere_selected_sharpe_ratio_portfolio_df(uncorrelated_weighted_portfolio_trails_simulation_df,selected_col): 
    
    uncorrelated_weighted_portfolio_trails_simulation_sorted_df = uncorrelated_weited_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', ascending=False)
    uncorrelated_weighted_portfolio_trails_simulation_sorted_df = uncorrelated_weited_portfolio_trails_simulation_sorted_df.reset_index(drop=True)

    top1_sharpe_ratio_value = uncorrelated_weighted_portfolio_trails_simulation_sorted_df['sharpes_ratio'].values[0]
    top1_E_rp_value= uncorrelated_weighted_portfolio_trails_simulation_sorted_df['E_rp'].values[0]
    top1_σp_value = uncorrelated_weighted_portfolio_trails_simulation_sorted_df['σp'].values[0]

    # selecting the optimal portfolios:portfolios with expected return higher or equal the minimun risky portfolio
    if selected_col == 'sharpes_ratio':
        uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = \
        uncorrelated_weighted_portfolio_trails_simulation_sorted_df[uncorrelated_weighted_portfolio_trails_simulation_sorted_df[selected_col] >= top1_sharpe_ratio_value] 
    elif selected_col == 'E_rp':
        uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = uncorrelated_weighted_portfolio_trails_simulation_sorted_df
                
    # sort the optimal portfolio data frame
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = \
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df.sort_values(by='σp', ascending=True)
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df = \
    uncorrelated_weighted_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df.reset_index(drop=True)
    return uncorelated_portfolio_trails_simulation_selected_sharpes_ratio_optimal_portfolios_df

#----------------------------------------------------------------------------------------
def efficient_frontiere_optimal_sharpe_ratio_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points = 35):
    #sort from maximum sharpe ratio and get top sharpe ratio portfolios
    portfolio_trails_simulation_sharpes_ratio_top_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', 
                                                                                                                        ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    uncorelated_portfolio_trails_simulation_sharpes_ratio_top_df =portfolio_trails_simulation_sharpes_ratio_top_df.head(number_of_top_points)
    xpoints_list = []
    ypoints_list = []
    top_sharpe_ratio_value_points_list = []

    for portfolio_number in range(number_of_top_points):
        #top shape ratio    
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['sharpes_ratio'].values[portfolio_number])
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['E_rp'].values[portfolio_number])

    xpoints = np.array(xpoints_list)
    ypoints = np.array(ypoints_list)
    top_sharpe_ratio_value_points = np.array(top_sharpe_ratio_value_points_list)

    return xpoints, ypoints, top_sharpe_ratio_value_points

def get_maximun_return_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='E_rp', 
                                                                                                                                ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.reset_index(drop=True)

    max_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['sharpes_ratio'].values[0]  
    max_E_rp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['E_rp'].values[0]
    max_E_rp_σp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['σp'].values[0]

    return max_E_rp_sharpe_ratio, max_E_rp, max_E_rp_σp
 
def get_maximun_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df):
    # here  the portfolios are sotrted from maximum risk
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = \
                                uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = \
                                        portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df.reset_index(drop=True)

    max_σp_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df['sharpes_ratio'].values[0]                                                                                                                  
    max_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df['E_rp'].values[0]
    max_σp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df['σp'].values[0]
    
    return max_σp_E_rp_sharpe_ratio, max_σp_E_rp, max_σp

def get_minimum_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df):
    # here  the portfolios are sotrted from minimum risk 
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = \
                                            uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=True)
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df.reset_index(drop=True)
    minimun_σp_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['sharpes_ratio'].values[0]
    minimun_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['E_rp'].values[0]
    minimun_σp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['σp'].values[0]
    return minimun_σp_E_rp_sharpe_ratio, minimun_σp_E_rp, minimun_σp

def get_maximum_sharpe_ratio(uncorrelated_weighted_portfolio_trails_simulation_df):
    #sort from maximum sharpe ratio and get top sharpe ratio portfolios
    portfolio_trails_simulation_sharpes_ratio_top_df = \
                                        uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    maximum_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_top_df['sharpes_ratio'].values[0]
    maximum_sharpe_ratio_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_top_df['E_rp'].values[0]
    maximum_sharpe_ratio_σp = portfolio_trails_simulation_sharpes_ratio_top_df['σp'].values[0]
    return maximum_sharpe_ratio, maximum_sharpe_ratio_σp_E_rp, maximum_sharpe_ratio_σp


#-----------------------------------------------------------------------------------------------------------------------------------
def efficient_frontiere_optimal_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points = 35):
      
    #number_of_top_points = 35   
    #sort from maximum sharpe ratio and get top sharpe ratio portfolios
    portfolio_trails_simulation_sharpes_ratio_top_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='sharpes_ratio', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    uncorelated_portfolio_trails_simulation_sharpes_ratio_top_df =portfolio_trails_simulation_sharpes_ratio_top_df.head(number_of_top_points)
    
    # minimum risk portfolio: here  the portfolios are sotrted from minimum risk 
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=True)
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df.head(number_of_top_points)
    
    minimun_σp_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['sharpes_ratio'].values[0]
    minimun_σp_E_rp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['E_rp'].values[0]
    minimun_σp = portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['σp'].values[0]
    
    # maximun return portfolio
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='E_rp', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.head(number_of_top_points)
    
    max_E_rp_sharpe_ratio = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['sharpes_ratio'].values[0]  
    max_E_rp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['E_rp'].values[0]
    max_E_rp_σp = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['σp'].values[0]
    
    # maximun risk portfolio: here  the portfolios are sotrted from maximum risk
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(by='σp', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_selecte_df.head(number_of_top_points)
    
    
        
    xpoints_list = []
    ypoints_list = []
    top_sharpe_ratio_value_points_list = []
    
    for portfolio_number in range(number_of_top_points):
        
        #top shape ratio    
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['sharpes_ratio'].values[portfolio_number])
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_top_df['E_rp'].values[portfolio_number])
        
        # minimum risk portfolio:
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['sharpes_ratio'].values[portfolio_number])
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_minun_σp_E_rp_df['E_rp'].values[portfolio_number])
        
        # maximun return portfolio
        top_sharpe_ratio_value_points_list.append(portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['sharpes_ratio'].values[portfolio_number])  
        xpoints_list.append(portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['σp'].values[portfolio_number])
        ypoints_list.append(portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df['E_rp'].values[portfolio_number])
        
     
    xpoints = np.array(xpoints_list)
    ypoints = np.array(ypoints_list)
    top_sharpe_ratio_value_points = np.array(top_sharpe_ratio_value_points_list)
    
    
    return xpoints.clip(minimun_σp,max_E_rp_σp),ypoints.clip(minimun_σp_E_rp,max_E_rp), \
                    top_sharpe_ratio_value_points.clip(minimun_σp_E_rp_sharpe_ratio,max_E_rp_sharpe_ratio)

def get_maximun_minimum_points(df):
    
    # maximun return portfolio
    max_df = df.sort_values(by='E_rp', ascending=False)
    max_df = max_df.reset_index(drop=True)
    max_df = max_df.head(1)
    max_E_rp_σp = max_df['σp']
    max_E_rp    = max_df['E_rp']
    max_E_rp_sharpe_ratio = max_df['sharpes_ratio']
    
    # minimum return portfolio
    min_df = df.sort_values(by='E_rp', ascending=True)
    min_df = min_df.reset_index(drop=True)
    min_df = min_df.head(1)
    minimun_σp = min_df['σp']
    minimun_σp_E_rp    = min_df['E_rp']
    minimun_σp_E_rp_sharpe_ratio = min_df['sharpes_ratio']
    
    return max_E_rp_σp, max_E_rp, max_E_rp_sharpe_ratio, minimun_σp, minimun_σp_E_rp, minimun_σp_E_rp_sharpe_ratio

#-----------------------------------------Efficient Frontiere Model Plotting-----------------------------------------------------------------------------------
#call efficient_frontiere_optimal_portfolios_df to include sharpe ration dataframe to the trails protfolios dataframe
#and select the optimal portfolios 
#prepare data for plotting and create the scatter plot
#include sharpe ration dataframe to the trails protfolios dataframe
#select the optimal portfolios(portfolios with expected return higher or equal to the minimumal risk portfolio)
#sorted by sharpe ration efficient_frontiere_selected_sharpe_ratio_portfolio_df
#------------------------------------------------------------------------------------------------------------------------------
def plot_fitted_curve(uncorrelated_weighted_portfolio_trails_simulation_df,fig, ax, label, marker, color ):
    #points plotting
       
    xpoints,ypoints,top_sharpe_ratio_value_points = \
                efficient_frontiere_optimal_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df,7)
    
    row, col = uncorrelated_weighted_portfolio_trails_simulation_df.shape
    #--model definition---
    mymodel = np.poly1d(np.polyfit(xpoints, ypoints,2))
    popt = np.polyfit(xpoints, ypoints,2)
    a, b, c = popt
    poly_d2_form = str('y =%.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
    display(np.polyfit(xpoints, ypoints,2))
    myline = np.linspace(xpoints.min(), xpoints.max(), row) 
    
    # optimal portfolios plotting
    ypred = mymodel(myline)
    ax.plot(xpoints,ypoints,'*',color='red',label='Optimal portfolios')
    ax.plot(myline, mymodel(myline),'.',color="blue",label=label + ':\n'+poly_d2_form)
    print(r2_score(ypoints, mymodel(xpoints)))
    
def plot_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax, colorbar = 'yes'):
   
    #random portfolio plotting
    optimal_portfolios_df = uncorrelated_weighted_portfolio_trails_simulation_df
    sharpes_ratio_optimal_portfolios_σp_col = optimal_portfolios_df['σp']
    sharpes_ratio_optimal_portfolios_E_rp_col = optimal_portfolios_df['E_rp']
    optimal_portfolios_sharpes_ratio_col = optimal_portfolios_df['sharpes_ratio']
                                                                 
    scplt = ax.scatter(sharpes_ratio_optimal_portfolios_σp_col, sharpes_ratio_optimal_portfolios_E_rp_col, marker="o",
                       c=optimal_portfolios_sharpes_ratio_col, cmap="viridis",label='Random Portfolios')
    if colorbar == 'yes':
        cb = fig.colorbar(scplt, ax=ax, label='Sharpe Ratio')
    ax.set_title("Towards an Efficient Frontier Model - Random portfolios Efficient Frontier")

    
def plot_fitted_curve_and_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df):
    fig, ax =plt.subplots(figsize=(12, 5))                                                               
    plot_fitted_curve(uncorrelated_weighted_portfolio_trails_simulation_df,fig, ax, label='Model to Approximate', marker= '*', color='red')                                                           
    plot_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax)                                                             
    ax.legend(prop = { "size": 8 })   

#plot_fitted_curve_and_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df)
In [19]:
#----------------------------------------------------------------------------------
# minimum risk portfolio: here  the portfolios are sotrted from minimum risk 
#-----------------------------------------------------------------------------------
def portfolio_strategy_minimum_risk(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points, most_diversify_portfolio_assets_list):
    
    portfolio_trails_simulation_minimum_risk_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='σp', ascending=True)
    portfolio_trails_simulation_minimum_risk_σp_E_rp_df = portfolio_trails_simulation_minimum_risk_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_minimum_risk_σp_E_rp_df = portfolio_trails_simulation_minimum_risk_σp_E_rp_df.head(number_of_top_points)
    
    portfolio_weight_df = portfolio_trails_simulation_minimum_risk_σp_E_rp_df[most_diversify_portfolio_assets_list]
    
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_tickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Tickers':asset_tickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Tickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_tickers = portfolio_investment_strategy_df['Asset Tickers']
    
    return strategy_Weight, strategy_tickers, portfolio_trails_simulation_minimum_risk_σp_E_rp_df
    
#----------------------------------------------------------------------------
#maximun risk portfolio: here  the portfolios are sotrted from minimum risk
#----------------------------------------------------------------------------
def portfolio_strategy_maximun_risk(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    #log_returns,threshold
    portfolio_trails_simulation_max_risk_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='σp', ascending=False)
    portfolio_trails_simulation_max_risk_σp_E_rp_df = portfolio_trails_simulation_max_risk_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_max_risk_σp_E_rp_df = portfolio_trails_simulation_max_risk_σp_E_rp_df.head(number_of_top_points)
    portfolio_weight_df = portfolio_trails_simulation_max_risk_σp_E_rp_df[most_diversify_portfolio_assets_list]
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_tickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Tickers':asset_tickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Tickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    #display(portfolio_investment_strategy_Trans_df)
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_tickers = portfolio_investment_strategy_df['Asset Tickers']
    
    return strategy_Weight, strategy_tickers, portfolio_trails_simulation_max_risk_σp_E_rp_df

#---------------------------------
# maximun return portfolio
#--------------------------------
def portfolio_strategy_maximun_return(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='E_rp', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.reset_index(drop=True)
    portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df.head(number_of_top_points)
    
    portfolio_weight_df = portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df[most_diversify_portfolio_assets_list]
    
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_tickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Tickers':asset_tickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Tickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    #display(portfolio_investment_strategy_Trans_df)
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_tickers = portfolio_investment_strategy_df['Asset Tickers']
    
    return strategy_Weight, strategy_tickers, portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df

#-------------------------------------------------------------------------
#sort from maximum sharpe ration and get top sharpe ratio portfolios
#-------------------------------------------------------------------------
def portfolio_strategy_top_sharpe_ratio(uncorrelated_weighted_portfolio_trails_simulation_df,number_of_top_points):
    
    portfolio_trails_simulation_sharpes_ratio_top_df = uncorrelated_weighted_portfolio_trails_simulation_df.sort_values(
                                                        by='sharpes_ratio', ascending=False)
    portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.reset_index(drop=True)
    uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df = portfolio_trails_simulation_sharpes_ratio_top_df.head(number_of_top_points)
    
    portfolio_weight_df = uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df[most_diversify_portfolio_assets_list]
    portfolio_weight_df = portfolio_weight_df*100
        
    portfolio_weight_df1 = portfolio_weight_df.head(1)
    portfolio_weight =portfolio_weight_df1.columns.values.tolist()
    asset_tickers = portfolio_weight_df1.iloc[0].tolist()
    portfolio_investment_strategy_df = pd.DataFrame({'Portfolio Weight':portfolio_weight,'Asset Tickers':asset_tickers})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Tickers',ascending=True)
    
    portfolio_investment_strategy_Trans_df = portfolio_investment_strategy_df.transpose()
    #display(portfolio_investment_strategy_Trans_df)
    strategy_Weight = portfolio_investment_strategy_df['Portfolio Weight']
    strategy_tickers = portfolio_investment_strategy_df['Asset Tickers']
    
    return strategy_Weight, strategy_tickers, uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df


#------------------------------------------------------------------------
#sort from maximum sharpe ratio and get top sharpe ratio portfolios
#------------------------------------------------------------------------
def portfolio_strategy_plotting(uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, number_of_top_points):
    fig, ax =plt.subplots(2,2, figsize=(14, 10))
    
    strategy_Weight, strategy_tickers,uncorrelated_portfolio_trails_simulation_sharpes_ratio_top_df = \
        portfolio_strategy_top_sharpe_ratio(uncorrelated_weighted_portfolio_trails_simulation_df, number_of_top_points)
   
    bar_container= ax[0,0].barh(strategy_Weight,strategy_tickers)
    # setting label of y-axis
    ax[0,0].set_ylabel("Asset Tickers")
    # setting label of x-axis
    #ax[0,0].set_xlabel("Portfolio Weight") 
    ax[0,0].set_title("Maximum Sharpe Ratio Portfolio Assets Allocation")
    ax[0,0].bar_label(bar_container, fmt='{:,.0f}%')
   
    
    # maximun return portfolio
    strategy_Weight, strategy_tickers,portfolio_trails_simulation_sharpes_ratio_max_σp_E_rp_df = \
        portfolio_strategy_maximun_return(uncorrelated_weighted_portfolio_trails_simulation_df, number_of_top_points) 
    bar_container= ax[0,1].barh(strategy_Weight,strategy_tickers)
 
    # setting label of y-axis
    ax[0,1].set_ylabel("Asset Tickers")
    # setting label of x-axis
    #ax[0,1].set_xlabel("Portfolio Weight") 
    ax[0,1].set_title("Maximun Return Portfolio Assets Allocation")
    ax[0,1].bar_label(bar_container, fmt='{:,.0f}%')
    
     # maximun risk portfolio: here  the portfolios are sotrted from minimum risk
    strategy_Weight, strategy_tickers,portfolio_trails_simulation_max_risk_σp_E_rp_df = \
        portfolio_strategy_maximun_risk(uncorrelated_weighted_portfolio_trails_simulation_df, number_of_top_points) 
    
    bar_container= ax[1,0].barh(strategy_Weight,strategy_tickers)
    # setting label of y-axis
    ax[1,0].set_ylabel("Asset tickers")
    # setting label of x-axis
    #ax[1,0].set_xlabel("Portfolio Weight") 
    ax[1,0].set_title("Maximun risk Portfolio Assets Allocation")
    ax[1,0].bar_label(bar_container, fmt='{:,.0f}%')
    
    # minimum risk portfolio: here  the portfolios are sotrted from maximun risk 
    strategy_Weight, strategy_tickers,portfolio_trails_simulation_minimum_risk_σp_E_rp_df = \
        portfolio_strategy_minimum_risk(uncorrelated_weighted_portfolio_trails_simulation_df, number_of_top_points, most_diversify_portfolio_assets_list)
    
    bar_container= ax[1,1].barh(strategy_Weight,strategy_tickers)
    # setting label of y-axis
    ax[1,1].set_ylabel("Asset Tickers")
    # setting label of x-axis
    #ax[1,1].set_xlabel("Portfolio Weight") 
    ax[1,1].set_title("Minimum risk Portfolio Assets Allocation")
    ax[1,1].bar_label(bar_container, fmt='{:,.0f}%')
    
    plt.show()

plot_fitted_curve_and_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df)    
portfolio_strategy_plotting(uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, 10) 
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393

Efficient Frontier modelling using Machine Learning techniques¶

Data Splitting / Model Selection¶

In [20]:
# Build the model
def model_poly_d2(x, a, b, c):
    return b * x**2 + a * x + c

# Data Splitting / Model Selection
def polynomial_degree2_model(uncorrelated_weighted_portfolio_trails_simulation_df):
     # Load the data : original random portfolios data points
    xpoints, ypoints, original_random_sharpe_ratio = \
    efficient_frontiere_optimal_portfolios_model_points( uncorrelated_weighted_portfolio_trails_simulation_df)
    
    #Split tranning, validation and testing data  
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2 = \
    train_test_split(xpoints, ypoints, test_size=0.3, random_state=42)
    x_model_validation_poly_d2, x_model_testing_poly_d2 = train_test_split(np.linspace(min(xpoints), max(xpoints), 
                                                                                       len(xpoints)), test_size=0.3, random_state=42)
   
    # model traning to get paarameters
    popt_poly_d2, pcov_poly_d2 = curve_fit(model_poly_d2, x_train_poly_d2,y_train_poly_d2, maxfev=50000)   
    a, b, c = popt_poly_d2
    
    poly_d2_form  = str('y =%.5f * x^2 + %.5f * x + %.5f' % (a, b, c))
    
    return x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form

#x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
#                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
#                                            polynomial_degree2_model(uncorrelated_weighted_portfolio_trails_simulation_df)
In [21]:
 # Build the model
def model_poly_d3_log(x, a, b, c, d, e):
    return a * np.log(abs(b )* x) + c*x**3 +d*x**2 + e
    
def polynomial_degree3_log_model(uncorrelated_weighted_portfolio_trails_simulation_df):
    # Load the data : original random portfolios data points
    xpoints,ypoints,original_random_sharpe_ratio = efficient_frontiere_optimal_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df)
        
    #Split tranning and testing data  
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log = \
                                                    train_test_split(xpoints, ypoints, test_size=0.3, random_state=42)
    x_model_validation_poly_d3_log, x_model_testing_poly_d3_log = \
                                train_test_split(np.linspace(min(xpoints), max(xpoints), len(xpoints)), test_size=0.3, random_state=42)
    
    # model validation data
    #x_model_validation = np.linspace(min(x_train), max(x_train), number_of_top_points*3) 
    
    # model traning to get parameters
    popt_poly_d3_log, pcov_poly_d3_log = curve_fit(model_poly_d3_log, x_train_poly_d3_log,y_train_poly_d3_log, maxfev=50000)   
    a, b, c, d, e = popt_poly_d3_log 
    
    poly_d3_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**3 + %.5f * x + %.5f' % (a, b, c, d, e))

    
    return x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form

#x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
#                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
#                                                            polynomial_degree3_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
In [22]:
# Build the model
def model_poly_d5_log(x, a, b, c):
    return a*np.log(abs(b)*x) + c*x**5
    
def polynomial_degree5_log_model(uncorrelated_weighted_portfolio_trails_simulation_df):
     # Load the data : original random portfolios data points
    xpoints,ypoints,original_random_sharpe_ratio = efficient_frontiere_optimal_portfolios_model_points(uncorrelated_weighted_portfolio_trails_simulation_df)
        
    #Split tranning and testing data  
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log = \
                                        train_test_split(xpoints, ypoints, test_size=0.3, random_state=42)
    x_model_validation_poly_d5_log, x_model_testing_poly_d5_log = \
                                train_test_split(np.linspace(min(xpoints), max(xpoints), len(xpoints)), test_size=0.3, random_state=42)

    popt_poly_d5_log, pcov_poly_d5_log = curve_fit(model_poly_d5_log, x_train_poly_d5_log,y_train_poly_d5_log, maxfev=50000)   
    a, b, c = popt_poly_d5_log
    
    poly_d5_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**5' % (a, b, c))
    
    return x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form

#x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
#                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
#                polynomial_degree5_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
In [23]:
def models_plotting(x, y, uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax, model_form):                                                                                 
    #------Random portfolio data plotting
    plot_random_portfolios(uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax,'no')
    cspl = ax.scatter(x=x, y=y, c=y/x, cmap="viridis",label='Efficient Frontier:\n'+model_form)
    #-----------model to approximate
    plot_fitted_curve(uncorrelated_weighted_portfolio_trails_simulation_df,fig, ax, label='Fitted Curve', marker= '*', color='red')
    
    ax.legend(bbox_to_anchor=(0.72, 1.38), ncol=1, prop = { "size": 8})
    return cspl
In [24]:
def dataframe_clipping(x_σp, y_E_rp, y_E_rp_pred ):
    
    clipped_df = pd.DataFrame({'σp':x_σp,'E_rp':y_E_rp,'y_E_rp_pred':y_E_rp_pred,'error':y_E_rp_pred - y_E_rp})
    clipped_df = clipped_df.sort_values(by='error',ascending=False)
    clipped_df['y_optimal_E_rp'] = np.where(clipped_df['E_rp'] <= clipped_df['y_E_rp_pred'], clipped_df['E_rp'],clipped_df['y_E_rp_pred'] )
    clipped_df['sharpes_ratio'] = clipped_df['y_optimal_E_rp']/clipped_df['σp']
    return clipped_df[clipped_df['error'] >= 0]
    
def model_uperBound_efficient_frontier( uncorrelated_weighted_portfolio_trails_simulation_df, model, model_popt, 
                                       ax , mode_form, random_points = 0):
    
    optimal_portfolios_df = uncorrelated_weighted_portfolio_trails_simulation_df
    x_σp = uncorrelated_weighted_portfolio_trails_simulation_df['σp']
    y_E_rp = uncorrelated_weighted_portfolio_trails_simulation_df['E_rp']
    row, col = uncorrelated_weighted_portfolio_trails_simulation_df.shape
    
    #here the original data frame is clipped to eliminate the upper bound Outlier 
    y_E_rp_pred = model(x_σp, *model_popt)
    clipped_df = dataframe_clipping(x_σp, y_E_rp, y_E_rp_pred )
     
    xpoints,ypoints,top_sharpe_ratio_value_points = efficient_frontiere_optimal_portfolios_model_points(clipped_df,7) 
     
    #------Random portfolio data plotting
    if random_points == 0:
        
        scplt = ax.scatter(clipped_df['σp'], clipped_df['E_rp'], marker="o", c=clipped_df['E_rp']/clipped_df['σp'], 
                       cmap="viridis",label='Random Portfolios')
        
    else:        
        xrandom_points,yrandom_points,random_sharpe_ratio_value_points = \
                        efficient_frontiere_optimal_sharpe_ratio_portfolios_model_points(clipped_df,random_points)
        scplt = ax.scatter(x=xrandom_points, y=yrandom_points, marker="o", c= random_sharpe_ratio_value_points, 
                       cmap="viridis",label='Random Portfolios')
        
    
    #efficient frontier plotting 
    x_model_σp = np.linspace(xpoints.min(), xpoints.max(), row)
    y_model_E_rp_pred = model(x_model_σp, *model_popt)
    cspl = ax.scatter(x=x_model_σp, y=y_model_E_rp_pred, marker="*", c= y_E_rp_pred/x_model_σp,
                      cmap="viridis",label='Efficient Frontier:\n'+mode_form)
     
    ax.set_title("Boundary Random portfolios Efficient Frontier")
    ax.legend(bbox_to_anchor=(0.72, 1.38), ncol=1, prop = { "size": 8})
        
    return scplt

Model Evaluation¶

Model validation¶

we validate the model(model parameters) on the training and the validation data.

In [25]:
def evalute_model_parameters(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    #polynoial degree 2 model  b * x**2 + a * x + c
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
                                                            polynomial_degree2_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    # model parameters
    a, b, c = popt_poly_d2
    #model prediction
    y_model_validation_pred_poly_d2 = model_poly_d2(x_model_validation_poly_d2, a, b, c)
      
    #polynomial degree 3 log model: a * np.log(b * x) + c*x**3 +d*x**2 + e
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
                                                            polynomial_degree3_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    # model parameters
    a, b, c, d, e = popt_poly_d3_log
    y_model_validation_pred_poly_d3_log = model_poly_d3_log(x_model_validation_poly_d3_log, a, abs(b), c, d, e) 
        
    #polynomial degree 5 log model: a*np.log(b*x) + c*x**5 
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
                polynomial_degree5_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    # model parameters
    a, b, c = popt_poly_d5_log
    y_model_validation_pred_poly_d5_log = model_poly_d5_log(x_model_validation_poly_d5_log, a,abs(b), c)
    

    return popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
           x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
           y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
           model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form
       
#popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
#x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
#y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
#model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
#                                        evalute_model_parameters(uncorrelated_weighted_portfolio_trails_simulation_df)
In [26]:
def model_validation_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):
     
    fig, ax =plt.subplots(2,2,figsize=(13, 13), constrained_layout=True) 
    
    popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
    y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
    model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
                                        evalute_model_parameters(uncorrelated_weighted_portfolio_trails_simulation_df)

    cspl1 = models_plotting(x_model_validation_poly_d2, y_model_validation_pred_poly_d2, 
                           uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,0], poly_d2_form)
    cspl2 = models_plotting(x_model_validation_poly_d3_log, y_model_validation_pred_poly_d3_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,1],poly_d3_log_form)
    cspl = models_plotting(x_model_validation_poly_d5_log, y_model_validation_pred_poly_d5_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[1,0], poly_d5_log_form)
    cplt4 =  model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2,
                                       ax[1,1], poly_d2_form)
    
    cb = fig.colorbar(cspl, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_validation_plotting(uncorrelated_weighted_portfolio_trails_simulation_df)    
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393

Model fine-tuning¶

In [27]:
def fine_tune_hyperparmeters(uncorrelated_weighted_portfolio_trails_simulation_df):
    
     #polynoial degree 2 model  b * x**2 + a * x + c
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
                                             polynomial_degree2_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    y_model_turning_pred_poly_d2 = model_poly_d2(x_model_validation_poly_d2, 0.075, -0.019, -0.007)
    poly_d2_form  = str('y =%.5f * x^2 + %.5f * x + %.5f' % (0.07, -0.016, -0.009))
    
    #polynomial degree 3 log model: a * np.log(b * x) + c*x**3 +d*x**2 + e
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
                                         polynomial_degree3_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    y_model_tuning_pred_poly_d3_log = model_poly_d3_log(x_model_validation_poly_d3_log,  0.256, 0.348, 0.00793, -0.060, 0.343) 
    poly_d3_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**3 + %.5f * x + %.5f' % (0.256, 0.348, 0.00793, -0.060, 0.343))
    
    #polynomial degree 5 log model: a*np.log(b*x) + c*x**5 
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
                polynomial_degree5_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    y_model_turning_pred_poly_d5_log = model_poly_d5_log(x_model_validation_poly_d5_log,  0.085, 1.44, -0.00058)
    poly_d5_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**5' % (0.085, 1.44, -0.00058))
       
    return model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, \
           x_model_validation_poly_d3_log, x_model_validation_poly_d5_log, y_train_poly_d2, y_model_turning_pred_poly_d2, y_train_poly_d3_log, \
           y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_turning_pred_poly_d5_log, poly_d2_form, \
           poly_d3_log_form, poly_d5_log_form
    
#model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, x_model_validation_poly_d3_log, \
#x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
#y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
#poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorrelated_weighted_portfolio_trails_simulation_df)
    
In [28]:
def model_tuning_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    fig, ax =plt.subplots(2,2,figsize=(13, 13), constrained_layout=True) 
    
    print(" Models Fine-tuning ")

    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
    y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
    poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorrelated_weighted_portfolio_trails_simulation_df)

    cspl1 = models_plotting(x_model_validation_poly_d2, y_model_tuning_pred_poly_d2, 
                           uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,0], poly_d2_form)
    cspl2 = models_plotting(x_model_validation_poly_d3_log, y_model_tuning_pred_poly_d3_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,1], poly_d3_log_form)
    cspl = models_plotting(x_model_validation_poly_d5_log, y_model_tuning_pred_poly_d5_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[1,0], poly_d5_log_form) 
    cplt4 = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, 
                                       model_poly_d2,popt_poly_d2, ax[1,1], poly_d2_form,7000)
    
    cb = fig.colorbar(cspl, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_tuning_plotting(uncorrelated_weighted_portfolio_trails_simulation_df)
 Models Fine-tuning 
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393

Model Testing¶

In [29]:
def test_the_model(uncorrelated_weighted_portfolio_trails_simulation_df):
     #polynoial degree 2 model  b * x**2 + a * x + c
    x_train_poly_d2, x_test_poly_d2, y_train_poly_d2, y_test_poly_d2,popt_poly_d2, \
                pcov_poly_d2, x_model_validation_poly_d2, x_model_testing_poly_d2, model_poly_d2, poly_d2_form = \
                                             polynomial_degree2_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    y_model_test_pred_poly_d2 = model_poly_d2(x_test_poly_d2, 0.07, -0.016, -0.009)
    poly_d2_form  = str('y =%.5f * x^2 + %.5f * x + %.5f' % (0.07, -0.016, -0.009))
    
    #polynomial degree 3 log model: a * np.log(b * x) + c*x**3 +d*x**2 + e
    x_train_poly_d3_log, x_test_poly_d3_log, y_train_poly_d3_log, y_test_poly_d3_log,popt_poly_d3_log, pcov_poly_d3_log, \
                                x_model_validation_poly_d3_log, x_model_testing_poly_d3_log, model_poly_d3_log, poly_d3_log_form = \
                                         polynomial_degree3_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    y_model_test_pred_poly_d3_log = model_poly_d3_log(x_test_poly_d3_log,  0.256, 0.348, 0.00793, -0.060, 0.343) 
    poly_d3_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**3 + %.5f * x + %.5f' % (0.256, 0.348, 0.00793, -0.060, 0.343))
    
    #polynomial degree 5 log model: a*np.log(b*x) + c*x**5 
    x_train_poly_d5_log, x_test_poly_d5_log, y_train_poly_d5_log, y_test_poly_d5_log, popt_poly_d5_log, pcov_poly_d5_log, \
                                x_model_validation_poly_d5_log, x_model_testing_poly_d5_log, model_poly_d5_log, poly_d5_log_form = \
                polynomial_degree5_log_model(uncorrelated_weighted_portfolio_trails_simulation_df)
     
    y_model_test_pred_poly_d5_log = model_poly_d5_log(x_test_poly_d5_log,  0.085, 1.44, -0.00058)
    poly_d5_log_form = str('y =%.5f * np.log( %.5f*x) + %.5f * x**5' % (0.085, 1.44, -0.00058))
       
    return model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
           y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
           y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form

model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                test_the_model(uncorrelated_weighted_portfolio_trails_simulation_df)      
In [30]:
def model_testing_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):  
    
    fig, ax =plt.subplots(2,2,figsize=(13, 13), constrained_layout=True)  
    
    print(" Model Testing ")
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
    y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                test_the_model(uncorrelated_weighted_portfolio_trails_simulation_df)
                                                                                   
    cspl1 = models_plotting(x_test_poly_d2, y_model_test_pred_poly_d2, 
                           uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,0], poly_d2_form)
    cspl2 = models_plotting(x_test_poly_d3_log, y_model_test_pred_poly_d3_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[0,1], poly_d3_log_form)
    cspl = models_plotting(x_test_poly_d5_log, y_model_test_pred_poly_d5_log, 
                               uncorrelated_weighted_portfolio_trails_simulation_df, fig, ax[1,0], poly_d5_log_form) 
    cplt4 = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, \
                                        model_poly_d2,popt_poly_d2, ax[1,1], poly_d2_form)

    cb = fig.colorbar(cspl, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_testing_plotting(uncorrelated_weighted_portfolio_trails_simulation_df)
 Model Testing 
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393
array([-0.06841505,  0.23859677, -0.14919578])
0.9642722705586393

Goodness of Fit Statistics¶

In [31]:
def error_metrics_statistics(y_true_0, y_pred_0,y_true_1, y_pred_1,y_true_2, y_pred_2,  poly_d2_form, poly_d3_log_form, poly_d5_log_form ):
    
    display('Poly_d2 : '+poly_d2_form)
    display('Poly_d3_log: '+poly_d3_log_form)
    display('Poly_d5_log: '+poly_d5_log_form)
    
    error_metrics_table = [['Type Error', 'Poly_d2 Error', 'Poly_d3_log Error','Poly_d5_log Error'], 
         ['Mean Absolute Error(MAE)', mean_absolute_error(y_true_0, y_pred_0),mean_absolute_error(y_true_1, y_pred_1),mean_absolute_error(y_true_2, y_pred_2)],
         ['Mean Absolute Percentage Error(MAPE)', mean_absolute_percentage_error(y_true_0, y_pred_0),mean_absolute_percentage_error(y_true_1, y_pred_1),mean_absolute_percentage_error(y_true_2, y_pred_2)],
         ['Neg.Mean Squared Error(RMSE)', -mean_squared_error(y_true_0, y_pred_0),-mean_squared_error(y_true_1, y_pred_1),-mean_squared_error(y_true_2, y_pred_2)],
         ['R-squared score', r2_score(y_true_0, y_pred_0),r2_score(y_true_1, y_pred_1),r2_score(y_true_2, y_pred_2)],
         ['Mean Squared Error(MSE)',mean_squared_error(y_true_0, y_pred_0),mean_squared_error(y_true_1, y_pred_1),mean_squared_error(y_true_2, y_pred_2)],
         ['Mean Squared Log Error(MSLE)', mean_squared_log_error(y_true_0, y_pred_0),mean_squared_log_error(y_true_1, y_pred_1),mean_squared_log_error(y_true_2, y_pred_2)]]
             
    return error_metrics_table
In [32]:
def model_residual_metrics(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred):
    
    validation_residual = y_train - y_model_validation_pred
    residuals_tuning_train = y_train - y_model_tuning_pred
    residuals_test = y_test - y_model_test_pred
    return validation_residual, residuals_tuning_train, residuals_test
    
def model_residual_plotting(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred, ax, title):
    
    validation_residual, residuals_tuning_train, residuals_test = \
                                    model_residual_metrics(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred)
    
    
    sns.scatterplot(ax=ax,x=y_model_validation_pred, y=validation_residual, label='Validation')
    sns.scatterplot(ax=ax,x=y_model_tuning_pred, y=residuals_tuning_train, label='Tuning')
    sns.scatterplot(ax=ax,x=y_model_test_pred, y=residuals_test, label='Test')
    
    ax.hlines(0, min(y_model_validation_pred), max(y_model_validation_pred), colors='r', linestyles='dashed')
    ax.hlines(0, min(y_model_tuning_pred), max(y_model_tuning_pred), colors='r', linestyles='dashed') 
    ax.hlines(0, min(y_model_test_pred), max(y_model_test_pred), colors='r', linestyles='dashed')
    
    ax.set_xlabel('Predicted Values')
    ax.set_ylabel('Residuals')
    ax.set_title(title)
In [33]:
def error_distribution(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred, ax, title):
    #residual calculation
    validation_residual, residuals_tuning_train, residuals_test = \
                                    model_residual_metrics(y_train, y_model_validation_pred, y_model_tuning_pred, y_test, y_model_test_pred)
    
    # Calculate errors
    error_validation = -1*validation_residual
    tuning_error_train = -1*residuals_tuning_train
    error_test = -1*residuals_test 

    # Plot error distribution

    sns.histplot(ax=ax, x=error_validation, kde=True, label='Validation errors', color='blue')
    sns.histplot(ax=ax, x=tuning_error_train, kde=True, label='Tuning errors', color='orange')
    sns.histplot(ax=ax, x=error_test, kde=True, label='Test errors', color='green')
    
    ax.set_xlabel('Error')
    ax.set_ylabel('Frequency')
    ax.set_title(title)
    ax.legend()
    
   
In [34]:
def residual_and_error_plotting(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    fig, ax =plt.subplots(2,3,figsize=(23, 17)) 
    #model validation
    popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
    y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
    model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
                                        evalute_model_parameters(uncorrelated_weighted_portfolio_trails_simulation_df)
    # Model Fine-tuning    
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, \
    x_model_validation_poly_d3_log, x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
    y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
    poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorrelated_weighted_portfolio_trails_simulation_df)
        
    
    # Model Testing
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
    y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                                    test_the_model(uncorrelated_weighted_portfolio_trails_simulation_df)
   
    #-------------------------------------residual plotting---------------------------------------------------------------
    #model validation
       
    # poly_d2_residual
    model_residual_plotting(y_train_poly_d2, y_model_validation_pred_poly_d2, y_model_tuning_pred_poly_d2, 
                        y_test_poly_d2, y_model_test_pred_poly_d2, ax[0,0], poly_d2_form)
    #poly_d3_log
    model_residual_plotting(y_train_poly_d3_log, y_model_validation_pred_poly_d3_log, y_model_tuning_pred_poly_d3_log, 
                        y_test_poly_d3_log, y_model_test_pred_poly_d3_log, ax[0,1], poly_d3_log_form)
    #poly_d5_log
    model_residual_plotting(y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, y_model_tuning_pred_poly_d5_log, 
                        y_test_poly_d5_log, y_model_test_pred_poly_d5_log, ax[0,2], poly_d5_log_form)
    
    #-----------error plotting---------------------------------------------------------------------------------------------
    # poly_d2_residual
    error_distribution(y_train_poly_d2, y_model_validation_pred_poly_d2, y_model_tuning_pred_poly_d2, 
                        y_test_poly_d2, y_model_test_pred_poly_d2, ax[1,0],poly_d2_form)
    #poly_d3_log
    error_distribution(y_train_poly_d3_log, y_model_validation_pred_poly_d3_log, y_model_tuning_pred_poly_d3_log, 
                        y_test_poly_d3_log, y_model_test_pred_poly_d3_log, ax[1,1], poly_d3_log_form)
    #poly_d5_log
    error_distribution(y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, y_model_tuning_pred_poly_d5_log, 
                        y_test_poly_d5_log, y_model_test_pred_poly_d5_log, ax[1,2], poly_d5_log_form)
    
In [35]:
def model_evalution_report(uncorrelated_weighted_portfolio_trails_simulation_df):

    print(" Model Validation ")   
    popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2,x_model_validation_poly_d3_log, \
    x_model_validation_poly_d5_log, y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, \
    y_model_validation_pred_poly_d3_log, y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, model_poly_d2, \
    model_poly_d3_log, model_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form = \
                                        evalute_model_parameters(uncorrelated_weighted_portfolio_trails_simulation_df)
 
    print(tabulate(error_metrics_statistics(y_train_poly_d2, y_model_validation_pred_poly_d2, y_train_poly_d3_log, y_model_validation_pred_poly_d3_log, 
                y_train_poly_d5_log, y_model_validation_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form), headers='firstrow',
                   tablefmt='fancy_grid', maxcolwidths=[None, 8]))
    
    print(" Model Fine-tuning ") 
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_model_validation_poly_d2, \
    x_model_validation_poly_d3_log, x_model_validation_poly_d5_log, y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, \
    y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log, y_model_tuning_pred_poly_d5_log, poly_d2_form, \
    poly_d3_log_form, poly_d5_log_form= fine_tune_hyperparmeters(uncorrelated_weighted_portfolio_trails_simulation_df)
        
    print(tabulate(error_metrics_statistics(y_train_poly_d2, y_model_tuning_pred_poly_d2, y_train_poly_d3_log, y_model_tuning_pred_poly_d3_log, y_train_poly_d5_log,
                        y_model_tuning_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form), headers='firstrow',
                   tablefmt='fancy_grid', maxcolwidths=[None, 8]))
    
    print(" Model Testing ")
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, \
    y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= \
                test_the_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    

    print(tabulate(error_metrics_statistics(y_test_poly_d2, y_model_test_pred_poly_d2,y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, 
                        y_model_test_pred_poly_d5_log,  poly_d2_form, poly_d3_log_form, poly_d5_log_form  ), headers='firstrow', 
                   tablefmt='fancy_grid', maxcolwidths=[None, 8]))
    
    residual_and_error_plotting(uncorrelated_weighted_portfolio_trails_simulation_df)
    

Model Evalution Report¶

In [36]:
 model_evalution_report(uncorrelated_weighted_portfolio_trails_simulation_df)
 Model Validation 
'Poly_d2 : y =0.25607 * x^2 + -0.07319 * x + -0.16664'
'Poly_d3_log: y =-0.01375 * np.log( -4.87781*x) + -0.04449 * x**3 + 0.11653 * x + -0.03184'
'Poly_d5_log: y =0.11275 * np.log( 1.17522*x) + -0.00147 * x**5'
╒══════════════════════════════════════╤═════════════════╤═════════════════════╤═════════════════════╕
│ Type Error                           │   Poly_d2 Error │   Poly_d3_log Error │   Poly_d5_log Error │
╞══════════════════════════════════════╪═════════════════╪═════════════════════╪═════════════════════╡
│ Mean Absolute Error(MAE)             │     0.0126922   │         0.0127389   │         0.0127086   │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Absolute Percentage Error(MAPE) │     0.326107    │         0.326331    │         0.32606     │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Neg.Mean Squared Error(RMSE)         │    -0.000277596 │        -0.000277636 │        -0.000277634 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ R-squared score                      │    -0.709529    │        -0.709779    │        -0.709765    │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Error(MSE)              │     0.000277596 │         0.000277636 │         0.000277634 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Log Error(MSLE)         │     0.000255259 │         0.000255241 │         0.000255276 │
╘══════════════════════════════════════╧═════════════════╧═════════════════════╧═════════════════════╛
 Model Fine-tuning 
'Poly_d2 : y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'Poly_d3_log: y =0.25600 * np.log( 0.34800*x) + 0.00793 * x**3 + -0.06000 * x + 0.34300'
'Poly_d5_log: y =0.08500 * np.log( 1.44000*x) + -0.00058 * x**5'
╒══════════════════════════════════════╤═════════════════╤═════════════════════╤═════════════════════╕
│ Type Error                           │   Poly_d2 Error │   Poly_d3_log Error │   Poly_d5_log Error │
╞══════════════════════════════════════╪═════════════════╪═════════════════════╪═════════════════════╡
│ Mean Absolute Error(MAE)             │     0.0155419   │         0.0199283   │         0.0157639   │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Absolute Percentage Error(MAPE) │     0.467216    │         0.541214    │         0.440471    │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Neg.Mean Squared Error(RMSE)         │    -0.000392389 │        -0.000555566 │        -0.000368295 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ R-squared score                      │    -1.41646     │        -2.42136     │        -1.26808     │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Error(MSE)              │     0.000392389 │         0.000555566 │         0.000368295 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Log Error(MSLE)         │     0.000358339 │         0.000502965 │         0.000335809 │
╘══════════════════════════════════════╧═════════════════╧═════════════════════╧═════════════════════╛
 Model Testing 
'Poly_d2 : y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'Poly_d3_log: y =0.25600 * np.log( 0.34800*x) + 0.00793 * x**3 + -0.06000 * x + 0.34300'
'Poly_d5_log: y =0.08500 * np.log( 1.44000*x) + -0.00058 * x**5'
╒══════════════════════════════════════╤═════════════════╤═════════════════════╤═════════════════════╕
│ Type Error                           │   Poly_d2 Error │   Poly_d3_log Error │   Poly_d5_log Error │
╞══════════════════════════════════════╪═════════════════╪═════════════════════╪═════════════════════╡
│ Mean Absolute Error(MAE)             │     0.0101095   │         0.0130635   │         0.00861755  │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Absolute Percentage Error(MAPE) │     0.262366    │         0.271526    │         0.193911    │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Neg.Mean Squared Error(RMSE)         │    -0.000132733 │        -0.000191517 │        -8.07217e-05 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ R-squared score                      │    -0.0889735   │        -0.571248    │         0.33774     │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Error(MSE)              │     0.000132733 │         0.000191517 │         8.07217e-05 │
├──────────────────────────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│ Mean Squared Log Error(MSLE)         │     0.000121714 │         0.000170487 │         7.29219e-05 │
╘══════════════════════════════════════╧═════════════════╧═════════════════════╧═════════════════════╛

Winning Model¶

In [37]:
def get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df):
    model_poly_d2, model_poly_d3_log, model_poly_d5_log, popt_poly_d2, popt_poly_d3_log, popt_poly_d5_log, \
    x_test_poly_d2, x_test_poly_d3_log, x_test_poly_d5_log, y_test_poly_d2, y_model_test_pred_poly_d2, \
    y_test_poly_d3_log, y_model_test_pred_poly_d3_log, y_test_poly_d5_log, \
    y_model_test_pred_poly_d5_log, poly_d2_form, poly_d3_log_form, poly_d5_log_form= test_the_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    
    return model_poly_d2, popt_poly_d2, poly_d2_form


def plotting_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2, poly_d2_form):
    fig, ax =plt.subplots(figsize=(9, 7), constrained_layout=True)
    cplt = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2, ax, poly_d2_form)

    cb = fig.colorbar(cplt, ax=ax, label='Sharpe Ratio',orientation='horizontal',shrink=0.6)
    
model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df)
plotting_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2, poly_d2_form)

Prediction when the investor's risk level metric (portfolio standard deviation) is known¶

Here we will use the wining efficient frontier model to predict the portfolio expected return. Then will calculater the portfolio weightsand and investment strategy The following 2 Strategies will be implemented to manage the volatility:

  1. Asset Allocation: Adjusting the proportion of different asset classes in a portfolio to balance risk.
  2. Diversification: Spreading investments across various sectors.
In [38]:
def plotting_selected_efficient_frontier_predicted_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df,risk):
    fig, ax =plt.subplots(figsize=(12, 5))
    text = ["A", "B", "C", "D", "E", "F"] 
    model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    predicted_return = model_poly_d2(risk, *popt_poly_d2)
    ax.plot(risk, predicted_return,'*',color='red',label='Optimal portfolios')
    scplt = model_uperBound_efficient_frontier(uncorrelated_weighted_portfolio_trails_simulation_df, model_poly_d2,popt_poly_d2, ax, poly_d2_form)
    portfolio_annotation(risk, predicted_return, text, ax)
    cb = fig.colorbar(scplt, ax=ax, label='Sharpe Ratio')
In [39]:
def predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, risk):
    model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    display(poly_d2_form)
    return model_poly_d2(risk, *popt_poly_d2)
    
    
pred_portfolio_expected_return  = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, 1.3)
   
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
In [40]:
def get_assets_expected_returns_and_tickers(log_returns, most_diversify_portfolio_assets_list):
    
    uncorrelated_assets_log_returns = log_returns[most_diversify_portfolio_assets_list]
    uncorrelated_assets_expected_return = uncorrelated_assets_log_returns.mean()
    #display(uncorrelated_assets_espected_return)
    #type(uncorrelated_assets_espected_return)
    assets_ticker_list = uncorrelated_assets_expected_return.index.tolist()
    #display(assets_ticker_list)
    assets_expected_returns_list = uncorrelated_assets_expected_return.to_list()
    return assets_expected_returns_list, assets_ticker_list

assets_expected_returns_list, assets_ticker_list = get_assets_expected_returns_and_tickers(log_returns,most_diversify_portfolio_assets_list)
In [41]:
def get_portfolio_investment_strategy_df( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                         most_diversify_portfolio_assets_list, portfolio_risk):

    sum_weight_and_portfolio_return_list = []
       
    #stocks expected return
    assets_expected_returns_list, assets_ticker_list = get_assets_expected_returns_and_tickers(log_returns,most_diversify_portfolio_assets_list)
    assets_expected_returns_list = np.array(assets_expected_returns_list)*100
    assets_expected_returns_list = list(np.round(assets_expected_returns_list, 3))
    
    #predicted portfolio expected return, given the portfolio volatility(risk)
    portfolio_return_predicted_value= round(predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk),3)
    
    #assets expected return absolute deviation from the portfolio expected return
    assets_expected_return_absolute_deviation_list = abs(portfolio_return_predicted_value - assets_expected_returns_list)
    assets_expected_return_absolute_deviation_list = list(np.round(assets_expected_return_absolute_deviation_list, 3))
    sum_expected_return_absolute_deviation = round(sum(assets_expected_return_absolute_deviation_list),3)
    
    #assets weight coefficients list
    assets_weight_list = assets_expected_return_absolute_deviation_list/sum_expected_return_absolute_deviation
    assets_weight_list = list(np.round(assets_weight_list, 3))
    
    #include the index content into the portfolio strategy data frame
    portfolio_content_df = index_content_df[index_content_df['Ticker'].isin(assets_ticker_list)]                   
    
    #portfolio strategy data frame
    portfolio_investment_strategy_df = pd.DataFrame({'Ticker':assets_ticker_list,'Weight':assets_weight_list,
                                                     'Asset Espected Returns':assets_expected_returns_list})
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Weight',ascending=True)
    
    #merge content data frame and the weght data frame
    portfolio_investment_strategy_df = pd.merge(portfolio_content_df, portfolio_investment_strategy_df, how="inner", on=["Ticker"])
   
    return portfolio_investment_strategy_df


portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( log_returns, 
                                                    uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, 1.3)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
In [42]:
def portfolio_annotation(x, y, text, ax):
    # Loop for annotation of all points 
    for i in range(len(x)): 
        ax.annotate(text[i]+'(σp='+str(round(x[i],3))+';E_rp='+ str(round(y[i],3))+')',
                    xy=(x[i], y[i]),xycoords='data', xytext= (x[i], y[i] ))  
  
In [43]:
def plot_investment_strategy_pie_chart(log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                       most_diversify_portfolio_assets_list, portfolio_risk, risk_profile = ''):
 
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Weight',ascending=True)
    
    industry_labels = portfolio_investment_strategy_df['Industry'].values
    sector_labels = portfolio_investment_strategy_df['Sector'].values
    weight_values = portfolio_investment_strategy_df['Weight'].values
    
    # Create subplots: use 'domain' type for Pie subplot
    fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
    
    fig.add_trace(go.Pie(labels=industry_labels, values=weight_values, name="Industry",
                        legendgroup="Industry",  # this can be any string, not just "group"
                        legendgrouptitle_text="Industry"), 1, 1)
    fig.add_trace(go.Pie(labels=sector_labels, values=weight_values, name="Sector",
                        legendgroup="Sector",  # this can be any string, not just "group"
                        legendgrouptitle_text="Sector"), 1, 2)
    
    

    # Use `hole` to create a donut-like pie chart
    fig.update_traces(hole=.5, hoverinfo="label+percent+name")

    fig.update_layout(
    title_text= risk_profile+" Suggested Investment by Industry & Sector",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Industry', x=0.14, y=0.5, font_size=20, showarrow=False),
                 dict(text='Sector', x=0.84, y=0.5, font_size=20, showarrow=False)],
    height=500, 
    width=800,
    autosize=True,
    margin=dict(t=0, b=0, l=50, r=0),
    legend_tracegroupgap = 0,
    legend=dict(    
                    orientation="v",
                    yanchor="bottom",
                    y=0,
                    xanchor="right",
                    x=1.5),
     title=dict(
                    y=0.9,
                    x=0.1,
                    xanchor= 'left',
                    yanchor= 'top'))
    
    fig.show()
In [44]:
def plot_asset_return_pie_chart(log_returns, uncorrelated_weighted_portfolio_trails_simulation_df,
                                most_diversify_portfolio_assets_list, portfolio_risk, risk_profile = ''):
 
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Espected Returns',ascending=True)
    
    industry_labels = portfolio_investment_strategy_df['Industry'].values
    sector_labels = portfolio_investment_strategy_df['Sector'].values
    weight_values = portfolio_investment_strategy_df['Asset Espected Returns'].values
    
    # Create subplots: use 'domain' type for Pie subplot
    fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
    
    fig.add_trace(go.Pie(labels=industry_labels, values=weight_values, name="Industry",
                        legendgroup="Industry",  # this can be any string, not just "group"
                        legendgrouptitle_text="Industry"), 1, 1)
    fig.add_trace(go.Pie(labels=sector_labels, values=weight_values, name="Sector",
                        legendgroup="Sector",  # this can be any string, not just "group"
                        legendgrouptitle_text="Sector"), 1, 2)

    # Use `hole` to create a donut-like pie chart
    fig.update_traces(hole=.5, hoverinfo="label+percent+name")

    fig.update_layout(
    title_text=risk_profile+" Asset Returns by Industry & Sector",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Industry', x=0.14, y=0.5, font_size=20, showarrow=False),
                 dict(text='Sector', x=0.84, y=0.5, font_size=20, showarrow=False)],
    height=500, 
    width=800,
    autosize=True,
    margin=dict(t=0, b=0, l=50, r=0),
    legend_tracegroupgap = 0,
    legend=dict(    
                    orientation="v",
                    yanchor="bottom",
                    y=0,
                    xanchor="right",
                    x=1.5),
     title=dict(
                    y=0.9,
                    x=0.1,
                    xanchor= 'left',
                    yanchor= 'top'))
    
    fig.show()
In [45]:
#Finding weights of portfolio when return given
def plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                      most_diversify_portfolio_assets_list, portfolio_risk,risk_profile = ''):
    fig, ax =plt.subplots(figsize=(12, 6))
      
    #plotting Asset Espected Returns
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Asset Espected Returns',ascending=True)
    column_list = [':      ' for i in range(len(portfolio_investment_strategy_df))]
    column_df = pd.DataFrame({'colum': column_list})
    
    asset_return = portfolio_investment_strategy_df['Asset Espected Returns']
    strategy_Tickers = portfolio_investment_strategy_df['Sector'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Industry'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Company'] + \
                        column_df['colum'] + portfolio_investment_strategy_df['Ticker'] 
    
    bar_container= ax.barh(strategy_Tickers, asset_return*100)
    ax.axes.get_xaxis().set_visible(False)
    # setting label of y-axis
    ax.set_ylabel("Asset Tickers")
    # setting label of x-axis
    ax.set_xlabel("Asset Return") 
    ax.set_title(risk_profile+" Asset Return",fontsize=22,  horizontalalignment='right',fontweight='roman')
    ax.bar_label(bar_container, fmt='{:,.1f}%')
    
    
    plt.show()
    #Asset return pie chart
    plot_asset_return_pie_chart( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                most_diversify_portfolio_assets_list, portfolio_risk, risk_profile)  
In [46]:
#Finding weights of portfolio when return given
def plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                    most_diversify_portfolio_assets_list, portfolio_risk, risk_profile = ''):
    fig, ax =plt.subplots(figsize=(12, 6))
    portfolio_investment_strategy_df = get_portfolio_investment_strategy_df( \
                log_returns,uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)
    portfolio_investment_strategy_df = portfolio_investment_strategy_df.sort_values(by='Weight',ascending=True)
    column_list = [':      ' for i in range(len(portfolio_investment_strategy_df))]
    column_df = pd.DataFrame({'colum': column_list})
   
    #plotting
    display(portfolio_investment_strategy_df.style.hide(axis='index'))
    
    strategy_Weight = portfolio_investment_strategy_df['Weight']
    strategy_Tickers = portfolio_investment_strategy_df['Sector'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Industry'] + column_df['colum'] + \
                        portfolio_investment_strategy_df['Company'] + \
                        column_df['colum'] + portfolio_investment_strategy_df['Ticker'] 
    bar_container= ax.barh(strategy_Tickers, strategy_Weight*100)
   
    ax.axes.get_xaxis().set_visible(False)
    #setting label of y-axis
    ax.set_ylabel("Asset Tickers")
    # setting label of x-axis
    ax.set_xlabel("Portfolio Weight") 
    ax.set_title(risk_profile+" suggested Portfolio Allocation", fontsize=22, horizontalalignment='right')
    ax.bar_label(bar_container, fmt='{:,.1f}%')
        
    plt.show()
    
    #Investement strategy pie chart
    plot_investment_strategy_pie_chart( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                       most_diversify_portfolio_assets_list, portfolio_risk, risk_profile)
      
In [47]:
def get_portolio_risk_input(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk):
    predited_portfolio_return = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    prediction_df = pd.DataFrame([{'portfolio_risk':portfolio_risk,'Predited Portfolio Return':predited_portfolio_return}])
    display(prediction_df.style.hide(axis='index'))
In [48]:
def plot_risk_tolerence_treshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk = 1.3):
    risk_tolerence_threshold_df = risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)

    print('\n                                              **********************************************************************\n'+
              '                                               Optimal Portfolio Table - Winning Model and Efficient Frontier\n'+
              '                                              **********************************************************************\n')
    display(risk_tolerence_threshold_df)
    portfolio_risk_values =  risk_tolerence_threshold_df['Portfolio Risk(volatility)'].values
    plotting_selected_efficient_frontier_predicted_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df,portfolio_risk_values)
 
In [49]:
def risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk = 1.3):
    #define threshold to track the investor risk tolerence:High risk tolerance (aggressive investors), Moderate risk tolerance (moderate investors)
    #Low risk tolerance (conservative investors)
    pred_random_portfolio_return = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    
    max_E_rp_sharpe_ratio, max_E_rp, max_E_rp_σp = get_maximun_return_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_maximun_return_portfolio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, max_E_rp_σp)
    
    max_σp_E_rp_sharpe_ratio, max_σp_E_rp, max_σp = get_maximun_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_maximun_risk_portfolio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, max_σp)

    maximum_sharpe_ratio, maximum_sharpe_ratio_σp_E_rp, maximum_sharpe_ratio_σp =  get_maximum_sharpe_ratio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_maximum_sharpe_ratio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, maximum_sharpe_ratio_σp)

    minimum_σp_E_rp_sharpe_ratio, minimum_σp_E_rp, minimum_σp = get_minimum_risk_portfolio(uncorrelated_weighted_portfolio_trails_simulation_df)
    pred_minimum_risk_portfolio = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, minimum_σp)
    avg_risk = uncorrelated_weighted_portfolio_trails_simulation_df['σp'].mean()
    pred_avg_risk_Expected_return = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, avg_risk)
    
    index = ['A', 'B', 'C', 'D', 'E', 'F'] 
    risk_tolerence_threshold_df =pd.DataFrame({'Portfolio Type': ['Random Portfolio', 'Maximun Return Portfolio','Maximun Risk Portfolio',
                                                                  'Maximum Sharpe Ratio(Tangent Portfolio)', 
                                              'Minimum Risk Portfolio', 'Average Volatilty'],
                              'Predicted Expected Return': [pred_random_portfolio_return, pred_maximun_return_portfolio, pred_maximun_risk_portfolio, 
                                                            pred_maximum_sharpe_ratio, pred_minimum_risk_portfolio, pred_avg_risk_Expected_return ],
                              'Portfolio Risk(volatility)':[portfolio_risk, max_E_rp_σp, max_σp, maximum_sharpe_ratio_σp, minimum_σp, avg_risk],          
                              'Sharpe Ratio':[pred_random_portfolio_return/portfolio_risk, pred_maximun_return_portfolio/maximum_sharpe_ratio_σp, 
                                              pred_maximun_risk_portfolio/max_σp, pred_maximum_sharpe_ratio/maximum_sharpe_ratio_σp ,
                                              pred_minimum_risk_portfolio/ minimum_σp, pred_avg_risk_Expected_return/avg_risk]},
                                           index=index)
    
    return risk_tolerence_threshold_df
In [50]:
def plot_suggested_portfolio_structure( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                       most_diversify_portfolio_assets_list, portfolio_risk):
    
    risk_tolerence_threshold_df =  risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    
    for i in range(len(risk_tolerence_threshold_df)):
        portfolio_risk = risk_tolerence_threshold_df['Portfolio Risk(volatility)'][i]
        predicted_expected_return = risk_tolerence_threshold_df['Predicted Expected Return'][i]
        sharpe_ratio = risk_tolerence_threshold_df['Sharpe Ratio'][i]
        print('\n                                    *************************************\n'+
              '                                      Portfolio Risk(volatility)  : '+str(round(portfolio_risk,3))+'\n'+
              '                                      Predicted Expected Return   : '+str(round(predicted_expected_return,3))+'\n'+
              '                                      Sharpe Ratio                : '+str(round(sharpe_ratio,3))+'\n'
              '                                     *************************************\n')
        plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                        most_diversify_portfolio_assets_list, portfolio_risk)    
        plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, most_diversify_portfolio_assets_list, portfolio_risk)  
In [51]:
def risk_tolerence_encoding(uncorrelated_weighted_portfolio_trails_simulation_df):
    
    risk_tolerence_threshold_df = risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df)
    max_Erp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['B']
    max_riskp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['C']
    max_shape_ratiop = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['D']
    min_riskp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['E']
    avg_riskp = risk_tolerence_threshold_df['Portfolio Risk(volatility)']['F'] 
    #σp	E_rp
    
    simulated_risk_list =  uncorrelated_weighted_portfolio_trails_simulation_df['σp']
    pred_Expected_return_list = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, simulated_risk_list)
    sharpe_ratio_list =pred_Expected_return_list/simulated_risk_list
    
    risk_profile_list = []
    risk_profile_encoding_list = []
    
    for i in range(len(simulated_risk_list)):
        portfolio_risk = simulated_risk_list[i]
        if portfolio_risk >= max_shape_ratiop and portfolio_risk <=avg_riskp :
            risk_profile_list.append('Moderate')
            risk_profile_encoding_list.append(1)
        elif portfolio_risk <max_shape_ratiop:
            risk_profile_list.append('Conservative')
            risk_profile_encoding_list.append(2)            
        elif portfolio_risk  > avg_riskp:
            risk_profile_list.append('Aggressive')
            risk_profile_encoding_list.append(3)
            
    risk_tolerence_rating_df= pd.DataFrame({'Simulated Risk': simulated_risk_list, 'Predicted Expected Return':pred_Expected_return_list,
                                            'Sharpe Ratio':sharpe_ratio_list, 'Risk Profile':risk_profile_list, 
                                            'Risk Profile Encoding Value':risk_profile_encoding_list})           
    return  risk_tolerence_rating_df     
        
risk_tolerence_encoding_df = risk_tolerence_encoding(uncorrelated_weighted_portfolio_trails_simulation_df)
display(risk_tolerence_encoding_df)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Simulated Risk Predicted Expected Return Sharpe Ratio Risk Profile Risk Profile Encoding Value
0 1.591041 0.055515 0.034892 Aggressive 3
1 1.461517 0.051284 0.035090 Aggressive 3
2 1.867592 0.056328 0.030161 Aggressive 3
3 1.423179 0.049561 0.034824 Moderate 1
4 1.282241 0.041377 0.032270 Moderate 1
... ... ... ... ... ...
9995 1.279847 0.041213 0.032202 Moderate 1
9996 1.345417 0.045405 0.033748 Moderate 1
9997 1.636396 0.056416 0.034476 Aggressive 3
9998 1.413588 0.049097 0.034732 Moderate 1
9999 1.380792 0.047406 0.034332 Moderate 1

10000 rows × 5 columns

In [52]:
def get_risk_profile_matrix(uncorrelated_weighted_portfolio_trails_simulation_df):
    risk_tolerence_encoding_df = risk_tolerence_encoding(uncorrelated_weighted_portfolio_trails_simulation_df)
    risk_profile_matrix =  risk_tolerence_encoding_df.groupby('Risk Profile')[['Simulated Risk','Predicted Expected Return','Sharpe Ratio']].mean()
    return pd.DataFrame(risk_profile_matrix)
   
risk_profile_matrix = get_risk_profile_matrix(uncorrelated_weighted_portfolio_trails_simulation_df)
risk_profile_matrix
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Out[52]:
Simulated Risk Predicted Expected Return Sharpe Ratio
Risk Profile
Aggressive 1.523578 0.053271 0.034979
Conservative 1.235111 0.037888 0.030628
Moderate 1.371376 0.046754 0.034070
In [53]:
def plot_suggested_risk_profile_portfolio_structure( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                    most_diversify_portfolio_assets_list, portfolio_risk=1.3):
    
    #risk_tolerence_threshold_df =  risk_tolerence_threshold(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk)
    risk_profile_matrix = get_risk_profile_matrix(uncorrelated_weighted_portfolio_trails_simulation_df)
    
    print('\n                                              **********************************************************************\n'+
              '                                                    Investment Profile Simulation And Portfolio Allocation \n'+
              '                                              **********************************************************************\n')
    display(risk_profile_matrix)
    for i in range(len(risk_profile_matrix)):
        risk_profile = risk_profile_matrix.index[i]
        portfolio_risk = risk_profile_matrix['Simulated Risk'][i]
        predicted_expected_return = risk_profile_matrix['Predicted Expected Return'][i]
        sharpe_ratio = risk_profile_matrix['Sharpe Ratio'][i]
        print('\n                                    *****************************************************\n'+
              '                                      Risk Profile                : '+risk_profile+' Investment \n'+
              '                                      Simulated Risk              : '+str(round(portfolio_risk,3))+'\n'+
              '                                      Predicted Expected Return   : '+str(round(predicted_expected_return,3))+'\n'+
              '                                      Sharpe Ratio                : '+str(round(sharpe_ratio,3))+'\n'
              '                                     *****************************************************\n')
        plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                        most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)    
        plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                          most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)  
In [54]:
plot_risk_tolerence_treshold(uncorrelated_weighted_portfolio_trails_simulation_df)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                              **********************************************************************
                                               Optimal Portfolio Table - Winning Model and Efficient Frontier
                                              **********************************************************************

Portfolio Type Predicted Expected Return Portfolio Risk(volatility) Sharpe Ratio
A Random Portfolio 0.042569 1.300000 0.032745
B Maximun Return Portfolio 0.057326 1.767518 0.044862
C Maximun Risk Portfolio 0.037199 2.274133 0.016357
D Maximum Sharpe Ratio(Tangent Portfolio) 0.041076 1.277847 0.032144
E Minimum Risk Portfolio 0.022132 1.055724 0.020964
F Average Volatilty 0.050252 1.437986 0.034946
In [55]:
plot_suggested_portfolio_structure( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                   most_diversify_portfolio_assets_list, 1.3)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.3
                                      Predicted Expected Return   : 0.043
                                      Sharpe Ratio                : 0.033
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.016000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.023000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.047000 0.031000
BMO Bank of Montreal Financial Services Banks 0.051000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.066000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.074000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.113000 0.014000
NGD New Gold Inc. Basic Materials Metals & Mining 0.121000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.125000 0.075000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.168000 0.000000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.195000 0.093000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.768
                                      Predicted Expected Return   : 0.057
                                      Sharpe Ratio                : 0.045
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.031000 0.047000
NGD New Gold Inc. Basic Materials Metals & Mining 0.053000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.057000 0.075000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.063000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.082000 0.031000
BMO Bank of Montreal Financial Services Banks 0.085000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.097000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.104000 0.024000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.113000 0.093000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.135000 0.014000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.179000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 2.274
                                      Predicted Expected Return   : 0.037
                                      Sharpe Ratio                : 0.016
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.000000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.025000 0.031000
BMO Bank of Montreal Financial Services Banks 0.029000 0.030000
BN Brookfield Corporation Financial Services Asset Management 0.042000 0.047000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.046000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.055000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.097000 0.014000
NGD New Gold Inc. Basic Materials Metals & Mining 0.155000 0.074000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.155000 0.000000
IGM IGM Financial Inc. Financial Services Asset Management 0.160000 0.075000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.235000 0.093000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.278
                                      Predicted Expected Return   : 0.041
                                      Sharpe Ratio                : 0.032
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.016000 0.037000
BN Brookfield Corporation Financial Services Asset Management 0.024000 0.047000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.040000 0.031000
BMO Bank of Montreal Financial Services Banks 0.044000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.060000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.068000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.108000 0.014000
NGD New Gold Inc. Basic Materials Metals & Mining 0.132000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.136000 0.075000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.164000 0.000000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.208000 0.093000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.056
                                      Predicted Expected Return   : 0.022
                                      Sharpe Ratio                : 0.021
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
TD Toronto-Dominion Bank Financial Services Banks 0.007000 0.024000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.015000 0.026000
BMO Bank of Montreal Financial Services Banks 0.030000 0.030000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.030000 0.014000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.033000 0.031000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.056000 0.037000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.082000 0.000000
BN Brookfield Corporation Financial Services Asset Management 0.093000 0.047000
NGD New Gold Inc. Basic Materials Metals & Mining 0.193000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.197000 0.075000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.264000 0.093000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *************************************
                                      Portfolio Risk(volatility)  : 1.438
                                      Predicted Expected Return   : 0.05
                                      Sharpe Ratio                : 0.035
                                     *************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.011000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.046000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.067000 0.031000
BMO Bank of Montreal Financial Services Banks 0.071000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.085000 0.026000
NGD New Gold Inc. Basic Materials Metals & Mining 0.085000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.088000 0.075000
TD Toronto-Dominion Bank Financial Services Banks 0.092000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.127000 0.014000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.152000 0.093000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.177000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
In [56]:
plot_suggested_risk_profile_portfolio_structure( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                most_diversify_portfolio_assets_list, portfolio_risk=1.3)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                              **********************************************************************
                                                    Investment Profile Simulation And Portfolio Allocation 
                                              **********************************************************************

Simulated Risk Predicted Expected Return Sharpe Ratio
Risk Profile
Aggressive 1.523578 0.053271 0.034979
Conservative 1.235111 0.037888 0.030628
Moderate 1.371376 0.046754 0.034070
                                    *****************************************************
                                      Risk Profile                : Aggressive Investment 
                                      Simulated Risk              : 1.524
                                      Predicted Expected Return   : 0.053
                                      Sharpe Ratio                : 0.035
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.023000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.056000 0.037000
NGD New Gold Inc. Basic Materials Metals & Mining 0.066000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.069000 0.075000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.076000 0.031000
BMO Bank of Montreal Financial Services Banks 0.079000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.092000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.099000 0.024000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.129000 0.093000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.132000 0.014000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.178000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Conservative Investment 
                                      Simulated Risk              : 1.235
                                      Predicted Expected Return   : 0.038
                                      Sharpe Ratio                : 0.031
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.004000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.029000 0.031000
BMO Bank of Montreal Financial Services Banks 0.033000 0.030000
BN Brookfield Corporation Financial Services Asset Management 0.037000 0.047000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.050000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.058000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.100000 0.014000
NGD New Gold Inc. Basic Materials Metals & Mining 0.149000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.154000 0.075000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.158000 0.000000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.228000 0.093000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Moderate Investment 
                                      Simulated Risk              : 1.371
                                      Predicted Expected Return   : 0.047
                                      Sharpe Ratio                : 0.034
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.000000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.037000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.060000 0.031000
BMO Bank of Montreal Financial Services Banks 0.063000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.078000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.086000 0.024000
NGD New Gold Inc. Basic Materials Metals & Mining 0.101000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.104000 0.075000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.123000 0.014000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.172000 0.093000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.175000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'

Investment Risk Profiles Simulation and Portfolio Strategy Using K-means Clustering¶

My main objective is to use K-means clustering to found out the investment type of risk tolerance: very conservative, conservative, moderate, aggressive and very aggressive. From the Elbow model, I will assume the optimal number of clusters is 5

In this section, I will combine K-means clustering with efficient frontier modeling to dig into the random generated portfolios. In order to simulate the investors risk tolerance, I will use K-means clustering and optimal portfolio modeling on top of the previous covariance matrix technique that I used along with the covariance threshold to reduce the volume of the assets. I used the covariance coefficient to filter uncorrelated assets. Then I will recommend an investment strategy that optimize return for each type for risk tolerance. I'm using the wining model 'y =0.07000 x^2 + -0.01600 x + -0.00900' to predict the portfolio expected returns. The simulated portfolio risk is combined with the simulated the portfolio expected return and the predicted expected return to set the random efficient frontier data. The random efficient frontier data is then used as input for the K-means cluster models. The cluster centroid corresponding to each type of risk tolerance is projected on the winning efficient frontier model in order to find out the optimal portfolio that corresponds to each type of risk tolerance.

In [57]:
def calculate_number_of_cluster(uncorrelated_weighted_portfolio_trails_simulation_df, n_components, ax):
    # Randomn efficient frontier data collection
    portfolio_risk_list = uncorrelated_weighted_portfolio_trails_simulation_df['σp']
    portefolio_return_list = uncorrelated_weighted_portfolio_trails_simulation_df['E_rp']
    predited_portfolio_return_list = predict_portfolio_expectded_return(uncorrelated_weighted_portfolio_trails_simulation_df, portfolio_risk_list)
    clipped_df = dataframe_clipping(portfolio_risk_list, portefolio_return_list, predited_portfolio_return_list )
    
    range_nbr_clusters = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
    # Step 2: Standardize the data
    scaler = StandardScaler()
    scaled_efficient_Frontier_data = scaler.fit_transform(clipped_df)


    #Determine the Number of clusters using Within Cluster Sum of Squares(wcss)
    wcss = [] # (Within Cluster Sum of Squares:inertia)
    silhouette_average_list = []
    
    for n2 in range_nbr_clusters:
        kmeans = KMeans(n_clusters=n2, init ='k-means++', max_iter=300,  n_init=10,random_state=0 )
        
        kmeans.fit(clipped_df)
        wcss.append(kmeans.inertia_)
        cluster_labels = kmeans.fit_predict(clipped_df)
        silhouette_average_list.append(silhouette_score(clipped_df, cluster_labels))
    
    ax1 = ax.twinx()
    ax.plot(range_nbr_clusters, wcss, 'b-', marker='o')
    ax1.plot(range_nbr_clusters,silhouette_average_list, 'g-', marker='o')
    
    ax.set_xlabel('Number of Clusters')
    ax.set_ylabel('Within Cluster Sum of Squares(wcss)')
    ax1.set_ylabel('Silhouette score')
    ax.set_title('Elbow Method & Silhouette Analysis for Optimal Number of Clusters') 
    #plt.show()
    
    return clipped_df
In [58]:
def implement_k_means_clusters(uncorrelated_weighted_portfolio_trails_simulation_df, clipped_df, fig, ax):
    # our main objective is to use K-means clustering to found out investment risk profile: very conservative, conservative, moderate, aggressive and very aggressive.
    # So from the Elbow model, let's assume the optimal number of clusters is 5

    kmeans = KMeans(n_clusters=5, init ='k-means++', max_iter=300, n_init=10,random_state=0 )
    #clusters = kmeans.fit_predict(pca_data)
    pred_clusters = kmeans.fit_predict(clipped_df)
    
    rand_data_point_and_cluster_df = clipped_df
    rand_data_point_and_cluster_df['cluster'] = pred_clusters
    
    
    #investment profile
    investment_profiles_index = ['Moderate', 'Conservative', 'Agressive', 'Very Aggressive', 'Very Conservative'] 
    investment_profiles_color = ['purple', 'gold', 'limegreen', 'green', 'yellow'] 
    display(rand_data_point_and_cluster_df)
    
    #plot cluster
    for i in range(len(investment_profiles_index)):
        cspl = ax.scatter(x=rand_data_point_and_cluster_df.loc[(rand_data_point_and_cluster_df['cluster'] ==i), ['σp']], 
                      y=rand_data_point_and_cluster_df.loc[(rand_data_point_and_cluster_df['cluster'] ==i), ['E_rp']], 
                      c= investment_profiles_color[i], cmap="viridis",label=investment_profiles_index[i]) 
        
    # find clusters centratides
    cluster_centers_df = pd.DataFrame(kmeans.cluster_centers_)
    cluster_centers_df = cluster_centers_df.set_axis( kmeans.feature_names_in_ , axis=1)
    cluster_centers_df.index =  investment_profiles_index
    cluster_centers_df.index.names = ['Investment Profile']   
    
    #plot clusters centroid. 
    ax.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker=".", s=100, c='red', label = 'Cluster Centroids')
    
    #plot efficienr frontier model
    model_poly_d2, popt_poly_d2, poly_d2_form = get_wining_model(uncorrelated_weighted_portfolio_trails_simulation_df)
    xpoints,ypoints,top_sharpe_ratio_value_points = efficient_frontiere_optimal_portfolios_model_points(clipped_df,7)
    row, col = uncorrelated_weighted_portfolio_trails_simulation_df.shape
    x_model_σp = np.linspace(xpoints.min(), xpoints.max(), row)
    y_model_E_rp_pred = model_poly_d2(x_model_σp, *popt_poly_d2)
    scplt = ax.scatter(x=x_model_σp, y=y_model_E_rp_pred, marker="D", c= y_model_E_rp_pred/x_model_σp,
                     cmap="viridis",label='Efficient Frontier:\n'+poly_d2_form)
    cb = fig.colorbar(scplt, ax=ax, label='Sharpe Ratio')
    
    #find model predicted centroide expected return
    pred_centroide_Expr_list = []
    pred_centroide_Expr_list = model_poly_d2(cluster_centers_df['σp'], *popt_poly_d2)
   
   
    cluster_centers_df['Pred Centroide Expr'] =pred_centroide_Expr_list
    cluster_centers_df['Pred Centroide Sharpe Ratio'] =pred_centroide_Expr_list/cluster_centers_df['σp']
    display(cluster_centers_df)
    
    #plotting model centroide
    ax.scatter(cluster_centers_df['σp'], pred_centroide_Expr_list, marker=".", s=100, c='blue', label = 'Model Centroids')
    
    ax.set_title('Simulated Porfolio Clusters')
    ax.set_xlabel('Volatility(Risk)')
    ax.set_ylabel('Expected Return ')
    ax.legend(prop = { "size": 8 })  
    plt.show()
    
    # Silhouette Score to evaluate the clustering
    sil_score = silhouette_score(clipped_df, pred_clusters)
    print(f'Silhouette Score: {sil_score}')
    return cluster_centers_df
In [59]:
def plot_predicted_clusters_risk_profile_portfolio_allocation( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                              most_diversify_portfolio_assets_list, cluster_centers_df):
    
    
    print('\n                           **********************************************************************\n'+
              '                                 Investment Profile Simulation And Portfolio Allocation \n'+
              '                           **********************************************************************\n')
    for i in range(len(cluster_centers_df)):
        risk_profile = cluster_centers_df.index[i]
        portfolio_risk = cluster_centers_df['σp'][i]
        predicted_expected_return = cluster_centers_df['Pred Centroide Expr'][i]
        sharpe_ratio = cluster_centers_df['Pred Centroide Sharpe Ratio'][i]
        print('\n                                    *****************************************************\n'+
              '                                      Risk Profile                : '+risk_profile+' Investment \n'+
              '                                      Simulated Risk              : '+str(round(portfolio_risk,3))+'\n'+
              '                                      Predicted Expected Return   : '+str(round(predicted_expected_return,3))+'\n'+
              '                                      Sharpe Ratio                : '+str(round(sharpe_ratio,3))+'\n'
              '                                     *****************************************************\n')
        plot_predicted_portfolio_weight( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                        most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)    
        plot_asset_return( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                          most_diversify_portfolio_assets_list, portfolio_risk,risk_profile)  
In [60]:
def implement_investement_profile_simulation(log_returns, uncorrelated_weighted_portfolio_trails_simulation_df,
                                             most_diversify_portfolio_assets_list, n_components):
    warnings.filterwarnings("ignore")
    fig, ax =plt.subplots(1,2,figsize=(21, 5))
    clipped_df = calculate_number_of_cluster(uncorrelated_weighted_portfolio_trails_simulation_df,n_components, ax[0]) 
    print('\n                                 *********************************************************************************\n'+
              '                                  Investement profile simulation  - Optimal Portfolio - Efficient Frontier Model \n'+
              '                                 *********************************************************************************\n')
    cluster_centers_df = implement_k_means_clusters(uncorrelated_weighted_portfolio_trails_simulation_df, clipped_df, fig, ax[1])
    plot_predicted_clusters_risk_profile_portfolio_allocation( log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                                              most_diversify_portfolio_assets_list, cluster_centers_df)

implement_investement_profile_simulation(log_returns, uncorrelated_weighted_portfolio_trails_simulation_df, 
                                         most_diversify_portfolio_assets_list, 2)
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                 *********************************************************************************
                                  Investement profile simulation  - Optimal Portfolio - Efficient Frontier Model 
                                 *********************************************************************************

σp E_rp y_E_rp_pred error y_optimal_E_rp sharpes_ratio cluster
3536 1.480160 0.031575 0.052045 0.020469 0.031575 0.021332 4
861 1.538196 0.033908 0.054085 0.020177 0.033908 0.022044 1
1686 1.513095 0.033426 0.053263 0.019837 0.033426 0.022091 1
7572 1.458223 0.031368 0.051145 0.019777 0.031368 0.021511 4
8227 1.527153 0.033984 0.053735 0.019751 0.033984 0.022253 1
... ... ... ... ... ... ... ...
9656 1.213137 0.036272 0.036302 0.000031 0.036272 0.029899 0
252 1.259952 0.039787 0.039817 0.000030 0.039787 0.031578 0
8959 1.337616 0.044917 0.044940 0.000023 0.044917 0.033580 2
4610 1.317238 0.043660 0.043681 0.000020 0.043660 0.033145 0
2529 1.221793 0.036966 0.036977 0.000011 0.036966 0.030255 0

9897 rows × 7 columns

σp E_rp y_E_rp_pred error y_optimal_E_rp sharpes_ratio Pred Centroide Expr Pred Centroide Sharpe Ratio
Investment Profile
Moderate 1.281066 0.034786 0.041189 0.006403 0.034786 0.027137 0.041297 0.032236
Conservative 1.537969 0.044667 0.054027 0.009360 0.044667 0.029042 0.054078 0.035162
Agressive 1.376557 0.038718 0.047131 0.008413 0.038718 0.028125 0.047176 0.034271
Very Aggressive 1.643127 0.048177 0.056376 0.008199 0.048177 0.029321 0.056524 0.034400
Very Conservative 1.455492 0.041679 0.050990 0.009312 0.041679 0.028633 0.051028 0.035059
Silhouette Score: 0.9673068137975777

                           **********************************************************************
                                 Investment Profile Simulation And Portfolio Allocation 
                           **********************************************************************


                                    *****************************************************
                                      Risk Profile                : Moderate Investment 
                                      Simulated Risk              : 1.281
                                      Predicted Expected Return   : 0.041
                                      Sharpe Ratio                : 0.032
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.016000 0.037000
BN Brookfield Corporation Financial Services Asset Management 0.024000 0.047000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.040000 0.031000
BMO Bank of Montreal Financial Services Banks 0.044000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.060000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.068000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.108000 0.014000
NGD New Gold Inc. Basic Materials Metals & Mining 0.132000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.136000 0.075000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.164000 0.000000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.208000 0.093000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Conservative Investment 
                                      Simulated Risk              : 1.538
                                      Predicted Expected Return   : 0.054
                                      Sharpe Ratio                : 0.035
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.023000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.056000 0.037000
NGD New Gold Inc. Basic Materials Metals & Mining 0.066000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.069000 0.075000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.076000 0.031000
BMO Bank of Montreal Financial Services Banks 0.079000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.092000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.099000 0.024000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.129000 0.093000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.132000 0.014000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.178000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Agressive Investment 
                                      Simulated Risk              : 1.377
                                      Predicted Expected Return   : 0.047
                                      Sharpe Ratio                : 0.034
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.000000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.037000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.060000 0.031000
BMO Bank of Montreal Financial Services Banks 0.063000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.078000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.086000 0.024000
NGD New Gold Inc. Basic Materials Metals & Mining 0.101000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.104000 0.075000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.123000 0.014000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.172000 0.093000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.175000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Very Aggressive Investment 
                                      Simulated Risk              : 1.643
                                      Predicted Expected Return   : 0.057
                                      Sharpe Ratio                : 0.034
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.031000 0.047000
NGD New Gold Inc. Basic Materials Metals & Mining 0.053000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.057000 0.075000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.063000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.082000 0.031000
BMO Bank of Montreal Financial Services Banks 0.085000 0.030000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.097000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.104000 0.024000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.113000 0.093000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.135000 0.014000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.179000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
                                    *****************************************************
                                      Risk Profile                : Very Conservative Investment 
                                      Simulated Risk              : 1.455
                                      Predicted Expected Return   : 0.051
                                      Sharpe Ratio                : 0.035
                                     *****************************************************

'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
Ticker Company Sector Industry Weight Asset Espected Returns
BN Brookfield Corporation Financial Services Asset Management 0.014000 0.047000
ENB Enbridge Inc. Energy Oil & Gas Storage/Transport 0.049000 0.037000
PEY Peyto Exploration & Development Corp. Energy Oil & Gas Exploration and Production 0.069000 0.031000
BMO Bank of Montreal Financial Services Banks 0.073000 0.030000
NGD New Gold Inc. Basic Materials Metals & Mining 0.080000 0.074000
IGM IGM Financial Inc. Financial Services Asset Management 0.083000 0.075000
DOL Dollarama Inc. Consumer Defensive Retail Defensive 0.087000 0.026000
TD Toronto-Dominion Bank Financial Services Banks 0.094000 0.024000
DOO BRP Inc. Consumer Cyclical Vehicles & Parts 0.128000 0.014000
CNQ Canadian Natural Resources Limited Energy Oil & Gas Exploration and Production 0.146000 0.093000
TVE Tamarack Valley Energy Ltd. Energy Oil & Gas Exploration and Production 0.177000 0.000000
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'
'y =0.07000 * x^2 + -0.01600 * x + -0.00900'

Portfolio Stress Testing¶

Macroeconomics Key Performance Indicators(KPIs) Data Collection and Preprocessing¶

In this section, we will use Statistic Canad's API stats-can to integrate the Canadian economic factors. We will then use Principal Components Analysis(PCA) technique to select the most importance economic factors.

In [61]:
warnings.filterwarnings("ignore")
#                      --------------------------------------------------------------------------------------------                    
#                        Trade Balance: Labour force characteristics by province, monthly, seasonally adjusted
#                      --------------------------------------------------------------------------------------------

def get_trade_balance_rate(reporting_year_period, frequency_date_column ):
        frequency = frequency_date_column[0].upper()
        df = sc.table_to_df("12-10-0011-01")    
        df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['Trade'] =='Trade Balance') & 
            (df['Principal trading partners']  == 'All countries'), ['REF_DATE','Trade','Principal trading partners','VALUE']]
        df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
        trade_balance_df = df1[[frequency_date_column, 'VALUE']]
        trade_balance_df['Trade Balance Rate'] = trade_balance_df['VALUE'].pct_change() * 100
        
        trade_balance_rate_df= trade_balance_df.groupby(frequency_date_column).mean()
        #trade_balance_rate_df = trade_balance_rate_df.rename(columns={'VALUE': 'Unemployment rate'})
        trade_balance_rate_df['Trade Balance Rate'] = round(trade_balance_rate_df['Trade Balance Rate'],1)
        trade_balance_rate_df = trade_balance_rate_df[['Trade Balance Rate']]
        trade_balance_rate_df = trade_balance_rate_df.dropna()
        return trade_balance_rate_df
    
    

#                      --------------------------------------------------------------------------------------------                    
#                        unemployment rate: Labour force characteristics by province, monthly, seasonally adjusted
#                      --------------------------------------------------------------------------------------------

def get_unemployment_rate(reporting_year_period, frequency_date_column ):
        frequency = frequency_date_column[0].upper()
        df = sc.table_to_df("14-10-0287-03")    
        df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['Labour force characteristics'] =='Unemployment rate') & 
            (df['UOM']  == 'Percentage'), ['REF_DATE','Labour force characteristics','UOM','VALUE']]
        df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
        unemployment_rate_df = df1[[frequency_date_column, 'VALUE']]
        unemployment_rate_df= unemployment_rate_df.groupby(frequency_date_column).mean()
        unemployment_rate_df = unemployment_rate_df.rename(columns={'VALUE': 'Unemployment rate'})
        unemployment_rate_df['Unemployment rate'] = round(unemployment_rate_df['Unemployment rate'],1)
        unemployment_rate_df = unemployment_rate_df.dropna()
        return unemployment_rate_df
    
    
#                      --------------------------------------------------------------------------------------------------
#                                Financial market statistics, last Wednesday unless otherwise stated, Bank of Canada
#                      --------------------------------------------------------------------------------------------------

def get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, col_value_name, rate_statement, frequency_date_column ):
    frequency = frequency_date_column[0].upper()
    
    df = sc.table_to_df("10-10-0122-01")
    df2 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['UOM']  == 'Percent') &  (df['Rates'].str.contains(rate_statement)), 
             ['REF_DATE','Rates','UOM','VALUE']]
    
    
    df2[frequency_date_column] = df2['REF_DATE'].dt.to_period(frequency)
    df2 = df2.dropna()
    goc_bonds_or_T_bill_df = df2[[frequency_date_column, 'VALUE']]
    #goc_bonds_or_T_bill_df['VALUE'] = round(goc_bonds_or_T_bill_df['VALUE'],1)
    goc_bonds_or_T_bill_df= goc_bonds_or_T_bill_df.groupby(frequency_date_column).mean()
    goc_bonds_or_T_bill_df= goc_bonds_or_T_bill_df.rename(columns={'VALUE': col_value_name})
    goc_bonds_or_T_bill_df[col_value_name] = round(goc_bonds_or_T_bill_df[col_value_name],1)
   
    return goc_bonds_or_T_bill_df  

#                   ---------------------------------------------------------------------------------------------------------------------
#                            CPI Inflaction:The CPI measures the average change over time in the prices paid by urban consumers 
#                            for a market basket of consumer goods and services, 
#                            and it's a key indicator of inflation (ING Think) (Inflation Calculator).
#                   ---------------------------------------------------------------------------------------------------------------------


def get_CPI_inflaction_rate(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    alternative_measures = 'Measure of core inflation based on a factor model, CPI-common (year-over-year percent change)'
    df = sc.table_to_df("18-10-0256-01")
    df2 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['UOM']  == 'Percent') &  
            (df['Alternative measures'] == alternative_measures), 
           ['REF_DATE','Alternative measures','UOM','VALUE']]
    df2[frequency_date_column] = df2['REF_DATE'].dt.to_period(frequency)
    df2 = df2.dropna()
    CPI_inflaction_rate_df = df2[[frequency_date_column, 'VALUE']]
    CPI_inflaction_rate_df= CPI_inflaction_rate_df.groupby(frequency_date_column).mean()
    CPI_inflaction_rate_df= CPI_inflaction_rate_df.rename(columns={'VALUE': 'CPI Inflaction Rate'})
    CPI_inflaction_rate_df['CPI Inflaction Rate'] = round(CPI_inflaction_rate_df['CPI Inflaction Rate'],1)
    return CPI_inflaction_rate_df

#                                    -----------------------------------------------------------------------------------
                                                                   #morgage rate
#                                    -----------------------------------------------------------------------------------


def get_morgage_rate(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("34-10-0145-01")
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period), ['REF_DATE', 'UOM','VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
    get_morgage_rate_df = df1[[frequency_date_column, 'VALUE']]
    get_morgage_rate_df = get_morgage_rate_df.groupby(frequency_date_column).mean()
    get_morgage_rate_df = get_morgage_rate_df.rename(columns={'VALUE': 'Morgage Rate'})
    get_morgage_rate_df['Morgage Rate'] = round(get_morgage_rate_df['Morgage Rate'],1)
    get_morgage_rate_df = get_morgage_rate_df.dropna()
    return get_morgage_rate_df

#          -------------------------------------------------------------------------------------------------------------------------------------
#                                                                  prime rate
#                The prime interest rate is the percentage that U.S. commercial banks charge their most creditworthy customers for loans. 
#                Like all loan rates, the prime interest rate is derived from the federal funds' overnight rate, set by the Federal Reserve at 
#                meetings held eight times a year. The prime interest rate is the benchmark banks and other lenders
#                use when setting their interest rates for every category of loan from credit cards to car loans and mortgages.
#          -------------------------------------------------------------------------------------------------------------------------------------

def get_prime_rate(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("10-10-0145-01")
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period), ['REF_DATE', 'UOM','VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    get_prime_rate_df = df1[[frequency_date_column, 'VALUE']]
    get_prime_rate_df.set_index(frequency_date_column, inplace=True)
    get_prime_rate_df = get_prime_rate_df.groupby(frequency_date_column).mean()
    get_prime_rate_df = get_prime_rate_df.rename(columns={'VALUE': 'Prime Rate'})
    get_prime_rate_df['Prime Rate'] = round(get_prime_rate_df['Prime Rate'],1)
    get_prime_rate_df = get_prime_rate_df.dropna()
    return get_prime_rate_df

#                       ----------------------------------------------------------------------------------------------------
#                                               House Price Index (house and land)
#                       ----------------------------------------------------------------------------------------------------

def get_house_price_index(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("18-10-0205-02")    
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['GEO'] =='Canada') & 
                 (df['New housing price indexes'] =='Total (house and land)')
                 , ['REF_DATE','New housing price indexes', 'VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
    get_house_price_index_df = df1[[frequency_date_column, 'VALUE']]
    get_house_price_index_df.set_index(frequency_date_column, inplace=True)
    get_house_price_index_df = ((get_house_price_index_df / get_house_price_index_df.shift(1)) - 1)*100

    get_house_price_index_df= get_house_price_index_df.groupby(frequency_date_column).mean()
    get_house_price_index_df = get_house_price_index_df.rename(columns={'VALUE': 'House Price Index(house and land)'})
    get_house_price_index_df['House Price Index(house and land)'] = round(get_house_price_index_df[
        'House Price Index(house and land)'],1)
    get_house_price_index_df = get_house_price_index_df.dropna()
    return get_house_price_index_df.tail(60)

#                             -----------------------------------------------------------------------------------------
#                                                     Real GDP growth Seasonal adjustment
#                             -----------------------------------------------------------------------------------------

def get_Real_GDP_growth(reporting_year_period, frequency_date_column):
    frequency = frequency_date_column[0].upper()
    df = sc.table_to_df("36-10-0434-02")    
    df1 = df.loc[(df['REF_DATE'] >= reporting_year_period) & (df['GEO'] =='Canada') &
                 (df['North American Industry Classification System (NAICS)'] =='All industries [T001]'),
                 ['REF_DATE','Seasonal adjustment', 'VALUE']]
    df1[frequency_date_column] = df1['REF_DATE'].dt.to_period(frequency)
    
    get_Real_GDP_growth_df = df1[[frequency_date_column, 'VALUE']]
    get_Real_GDP_growth_df.set_index(frequency_date_column, inplace=True)
    
    #get_Real_GDP_growth_df= get_Real_GDP_growth_df.groupby('MONTH_YEAR').sum()
    get_Real_GDP_growth_df= get_Real_GDP_growth_df.groupby(frequency_date_column).mean()
    get_Real_GDP_growth_df = ((get_Real_GDP_growth_df / get_Real_GDP_growth_df.shift(1)) - 1)*100
    get_Real_GDP_growth_df = get_Real_GDP_growth_df.rename(columns={'VALUE': 'Real GDP growth Seasonal adjustment'})
    get_Real_GDP_growth_df['Real GDP growth Seasonal adjustment'] = round(get_Real_GDP_growth_df[
        'Real GDP growth Seasonal adjustment'],1)
    get_Real_GDP_growth_df = get_Real_GDP_growth_df.dropna()
    return get_Real_GDP_growth_df.tail(60)

#                         ------------------------------------------------------------------------------------------------------
#                                                               Marcket Valatility
                                               

#                          oronto Stock Exchange statistics1: S&P/TSX 60 VIX Index (VIXI.TS)
#                          The S&P/TSX 60 is a market-capitalization-weighted index that tracks the performance of the 60 largest
#                          companies listed on the Toronto Stock Exchange (TSX). The S&P/TSX Composite, on the other hand,
#                          is a broader index that includes all common stocks and income trust units listed on the TSX

#                          The S&P/TSX Composite provides a more comprehensive view of the Canadian stock market. 
#                          3It includes a wider range of companies, from small-cap to large-cap. This makes it a good 
#                          choice for investors who want to diversify their portfolio across different sectors and market capitalizations. 
#                          ://www.spglobal.com/spdji/en/indices/equity/sp-tsx-composite-index/#overview
#                          Toronto Stock Exchange statisticand  :S&P/TSX 60 VIX Index (VIXI.TS),
#                          S&P/TSX Venture Composite Index (^SPCDNX) and S&P/TSX Composite index (^GSPTSE)
#                          The S&P 500 index, or Standard & Poor’s 500, is a very important index that tracks
#                          the performance of the stocks of 500 large-cap companies in the U.S. The ticker symbol for the S&P 500 index is ^GSPC.
#                          The DJIA tracks the stock prices of 30 of the biggest American companies. 
#                          The S&P 500 tracks 500 large-cap American stocks. Both offer a big-picture view of the state of the 
#                          stock markets in general
#                          https://www.investopedia.com/ask/answers/difference-between-dow-jones-industrial-average-and-sp-500/#:
#                          ~:text=Key%20Takeaways,the%20stock%20markets%20in%20general.
#                        ---------------------------------------------------------------------------------------------------------------

def get_market_index_volatility(reporting_year_period, frequency_date_column, market_index_list = ['^GSPTSE', '^GSPC', '^DJI']):
    frequency = frequency_date_column[0].upper()

    start_date = reporting_year_period
    end_date = date.today()
    #index_yahoo_adj_close_price_data = yf.download(market_index_list, start_date, end_date, ['Adj Close'], period ='max')
    #market_adj_close_price_df = index_yahoo_adj_close_price_data['Adj Close']
    market_adj_close_price_df = create_adj_close_price_df(reporting_year_period, market_index_list)
    
    market_adj_close_price_log_return_df = np.log(market_adj_close_price_df/ market_adj_close_price_df.shift(1))  
    # drop columns with all NaN's
    market_adj_close_price_log_return_df = market_adj_close_price_log_return_df.dropna(axis=0)
    
    #Market volatility
        
    market_volatility_df = market_adj_close_price_log_return_df.rolling(center=False,window= 252).std() * np.sqrt(252)
    for col in list(market_volatility_df.columns):
        market_volatility_df = market_volatility_df.rename(columns={col: 'Market '+col+' Volatility Index'})
    
    market_volatility_df = market_volatility_df.dropna(axis=0)
    
    market_volatility_df[frequency_date_column] = pd.to_datetime(market_volatility_df.index, format = '%m/%Y')
    market_volatility_df[frequency_date_column] = market_volatility_df[frequency_date_column].dt.to_period(frequency)
        
    #market_adj_close_price_log_return_frequency_df = market_volatility_df
    market_volatility_df.set_index(frequency_date_column, inplace=True)
    market_volatility_index_df = market_volatility_df.groupby(frequency_date_column).mean()
    market_volatility_index_df = round(market_volatility_index_df,1)
    market_volatility_index_df = market_volatility_index_df.dropna(axis=0)
    return market_volatility_index_df
    #if frequency == 'M' :
    #    return market_volatility_index_df.tail(60)
    #else:
    #    return market_volatility_index_df.tail(20)
    
#-------------------------------------------------------Governement of Canada Bonds average----------------------------------------------

def goc_bonds_average(reporting_year_period, frequency_date_column):
    goc_bonds_average_yield_1_3_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC Marketable Bonds Average Yield: 1-3 year',
        'Government of Canada marketable bonds, average yield: 1-3 year', frequency_date_column)
    goc_bonds_average_yield_5_10_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC Marketable Bonds Average Yield: 5-10 year',
        'Government of Canada marketable bonds, average yield: 5-10 year', frequency_date_column)
    goc_bonds_average_yield_3_5_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC Marketable Bonds Average Yield: 3-5 year',
        'Government of Canada marketable bonds, average yield: 3-5 year', frequency_date_column)
    goc_bonds_average_yield_over_10_years_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 
                                                                                        'GOC Marketable Bonds Average Yield: over 10 years',
         'Government of Canada marketable bonds, average yield: over 10 years', frequency_date_column)
    
    goc_bonds_average_df = goc_bonds_average_yield_1_3_df.merge(goc_bonds_average_yield_5_10_df, 
                                                                on= frequency_date_column, how='inner') \
                                                         .merge(goc_bonds_average_yield_3_5_df, on= frequency_date_column, how='inner') \
                                                         .merge(goc_bonds_average_yield_over_10_years_df, on= frequency_date_column, how='inner') 
    return goc_bonds_average_df

#------------------------- Governement of Canada Benchmark Bonds Yield -------------------------------------------------------------------

def goc_benchmark_bonds_yield(reporting_year_period, frequency_date_column):
    goc_benchmark_bonds_yield_over_2_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 2 year',
                                            'Selected Government of Canada benchmark bond yields: 2 year' , frequency_date_column)
    goc_benchmark_bonds_yield_over_3_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 3 year',
                                                'Selected Government of Canada benchmark bond yields: 3 year', frequency_date_column)
    goc_benchmark_bonds_yield_over_5_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 5 year',
                                    'Selected Government of Canada benchmark bond yields: 5 year', frequency_date_column)
    goc_benchmark_bonds_yield_over_7_year_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 7 year',
                                    'Selected Government of Canada benchmark bond yields: 7 year', frequency_date_column)
    goc_benchmark_bonds_yield_over_10_years_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: 10 years',
                                    'Selected Government of Canada benchmark bond yields: 10 years', frequency_date_column)
    goc_benchmark_bonds_yield_over_long_term_df = \
            get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'GOC benchmark bond yields: long term',
                                         'Selected Government of Canada benchmark bond yields: long term', frequency_date_column)
    goc_benchmark_bonds_yield_df = \
            goc_benchmark_bonds_yield_over_2_year_df.merge(goc_benchmark_bonds_yield_over_3_year_df, 
                                                       on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_5_year_df, on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_7_year_df, on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_10_years_df, on= frequency_date_column, how='inner') \
                                .merge(goc_benchmark_bonds_yield_over_long_term_df, on= frequency_date_column, how='inner')  
                                
    return goc_benchmark_bonds_yield_df

 #------------------------------------------------------------Governement of Canada Treasurt Bills --------------------------------------------
def Treasury_bills(reporting_year_period, frequency_date_column):
    
    Treasury_bills_1_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 1 month',
                                'Treasury bills: 1 month', frequency_date_column)
    Treasury_bills_2_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 2 month',
                                'Treasury bills: 2 month', frequency_date_column)
    Treasury_bills_3_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 3 month',
                                 'Treasury bills: 3 month', frequency_date_column)
    Treasury_bills_6_month_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 6 month',
                                'Treasury bills: 6 month', frequency_date_column)
    Treasury_bills_1_year_df = get_Government_of_Canada_bonds_or_T_bill(reporting_year_period, 'Treasury bills: 1 year',
                                'Treasury bills: 1 year', frequency_date_column)
    Treasury_bills_df = Treasury_bills_1_month_df.merge(Treasury_bills_2_month_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_3_month_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_6_month_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_1_year_df, on=frequency_date_column, how='inner')  
    return Treasury_bills_df

#-----------------------------------------  Other Economic Factors ------------------------------------------------------------------

def other_economic_factors(reporting_year_period, frequency_date_column):
    unemployment_rate_df = get_unemployment_rate(reporting_year_period, frequency_date_column)
    CPI_inflaction_rate_df = get_CPI_inflaction_rate(reporting_year_period, frequency_date_column)
    get_morgage_rate_df = get_morgage_rate(reporting_year_period, frequency_date_column)
    get_prime_rate_df = get_prime_rate(reporting_year_period, frequency_date_column)
    get_house_price_index_df = get_house_price_index(reporting_year_period, frequency_date_column)
    get_Real_GDP_growth_df = get_Real_GDP_growth(reporting_year_period, frequency_date_column)
    market_index_volatility_df = get_market_index_volatility(reporting_year_period, frequency_date_column)
    trade_balance_rate_df = get_trade_balance_rate(reporting_year_period, frequency_date_column)
    
    other_economic_factors_df = CPI_inflaction_rate_df.merge(get_morgage_rate_df, on= frequency_date_column, how='inner') \
                                                    .merge(get_prime_rate_df, on= frequency_date_column, how='inner') \
                                                .merge(get_house_price_index_df, on= frequency_date_column, how='inner') \
                                                .merge(unemployment_rate_df, on= frequency_date_column, how='inner') \
                                                .merge(get_Real_GDP_growth_df, on= frequency_date_column, how='inner') \
                                                .merge(market_index_volatility_df, on= frequency_date_column, how='inner')  
    
    return other_economic_factors_df

#-----------------------------------------------------------All the Economic Factors -----------------------------------------
def get_economic_factors_df(reporting_year_period, reporting_frequency):
     #set reporting frequency
        
    if reporting_frequency.capitalize() == 'Month' or reporting_frequency.capitalize() == 'Quarter':
        frequency_date_column = reporting_frequency.capitalize() + '_Year'
        #frequency = reporting_frequency[0].upper()
        
        goc_bonds_average_df = goc_bonds_average(reporting_year_period, frequency_date_column)
        goc_benchmark_bonds_yield_df = goc_benchmark_bonds_yield(reporting_year_period, frequency_date_column) 
        Treasury_bills_df = Treasury_bills(reporting_year_period, frequency_date_column)
        other_economic_factors_df = other_economic_factors(reporting_year_period, frequency_date_column)
    
        economic_factors_df = goc_bonds_average_df.merge(goc_benchmark_bonds_yield_df, on= frequency_date_column, how='inner') \
                                            .merge(Treasury_bills_df, on= frequency_date_column, how='inner') \
                                            .merge(other_economic_factors_df, on= frequency_date_column, how='inner')  

        
        return economic_factors_df
    else:
        return 'The reporting frequency should be alphanbetic, Month or Qurater'
    
#-------------------------------------------------------------Macroeconomics factors Plotting---------------------------------------

def annotate_bars(ax):# this function is generated by ChatGPT
    for p in ax.patches:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy() 
        ax.annotate(f'{height:.1f}', (x + width/2, y + height/2), ha='center', va='center', fontsize=10, color='black')


def get_economic_factors_barplotting(goc_bonds_average_df, goc_benchmark_bonds_yield_df,Treasury_bills_df, other_economic_factors_df ):
    fig, axes =plt.subplots(4,1,figsize=(20, 35), constrained_layout=True)  
   
        
    bar_width = 0.7 
        
    bar0 = goc_bonds_average_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[0])
    bar0.set_title('Governement of Canada Bonds Average',color='black')
    bar0.legend(loc='best')
    annotate_bars(axes[0])
        
    bar1 = goc_benchmark_bonds_yield_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[1]) 
    bar1.set_title('Governement of Canada Benchmark Bonds Yield',color='black')
    bar1.legend(loc='best')
    annotate_bars(axes[1])
        
    bar2 = Treasury_bills_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[2]) 
    bar2.set_title("Governement of Canada Treasury Bills",color='black')
    bar2.legend(loc='best')
    annotate_bars(axes[2])
       
    bar3 = other_economic_factors_df.plot(kind='bar', width=bar_width, stacked=True, ax = axes[3]) 
    bar3.set_title('Governement of Canada Other Economic Factirs',color='black')
    bar3.legend(loc='best')
    annotate_bars(axes[3])
        
        
#----------------------------Principal Components Analysis(PCA) to select most importance economic factors ---------------------------------

def selecting_importent_economic_factors_treshold_method_PCA(df,threshold):
    
    return df[(df.abs() > threshold).any(axis=1)].index.to_list()

def setting_PCA_for_economic_factors(economic_factors_df):
    # economic indicators dataset
   # economic_factors_df = get_economic_factors_df(reporting_year_period, reporting_frequency)

    # Standardizing the data
    scaler = StandardScaler()
    scaled_data_df = scaler.fit_transform(economic_factors_df)

    # Applying PCA
    all_pca = PCA(n_components=None)  # Use all components to find the best number of important indicators
    all_principal_components = all_pca.fit_transform(scaled_data_df)

    # Explained variance
    explained_variance = all_pca.explained_variance_ratio_

    # Principal Component Loadings(coefficients)
    loadings_matrix = all_pca.components_

    # Create a DataFrame for loadings 
    loadings_matrix_df = pd.DataFrame(loadings_matrix.T, columns=[f'PC{i+1}' for i in range(loadings_matrix.shape[0])], 
                                      index=economic_factors_df.columns)
    return loadings_matrix_df, explained_variance

def get_num_components(explained_variance,cumulative_variance_treshold = 0.9):    
    # Determine the number of components explaining the cumulative varience treshold of the variance
    cumulative_variance = explained_variance.cumsum()
    return  (cumulative_variance <= cumulative_variance_treshold).sum() + 1

def select_top_components_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    return loadings_matrix_df.iloc[:, :num_components]
     
def select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_high_loadings = 0.5):
    # Select top components
    selected_components_df = loadings_matrix_df.iloc[:, :num_components]
    # Find indicators with high loadings
    return selected_components_df[(selected_components_df.abs() > threshold_for_high_loadings).any(axis=1)]

    
def plot_explained_variance_(economic_factors_df):
    
    loadings_matrix_df, explained_variance =  setting_PCA_for_economic_factors(economic_factors_df)
    
    # Print explained variance
    
    explained_variance_df = pd.DataFrame(explained_variance).T
    explained_variance_df.columns = loadings_matrix_df.columns
    display(explained_variance_df)
    
    
    # Plotting the explained variance
    plt.figure(figsize=(10, 6))
    plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.5, align='center', label='individual explained variance')
    plt.step(range(1, len(explained_variance) + 1), np.cumsum(explained_variance), where='mid', label='cumulative explained variance')
    plt.xlabel('Principal Components')
    plt.ylabel('Explained Variance Ratio')
    plt.title('Explained Variance by Principal Components')
    plt.legend(loc='best')
    plt.show()
    

def plotting_corr_matrix(economic_factors_matrix, title):
    
    g = sns.clustermap(economic_factors_matrix ,  method = 'complete', cmap   = 'RdBu', annot  = True, annot_kws = {'size': 15},figsize=(20, 15),
                      cbar_kws={"shrink": 0.6, "aspect": 15})   
    
    plt.subplots_adjust(top=0.85)
    plt.setp(g.ax_heatmap.get_xticklabels(), rotation=90)
    plt.setp(g.ax_heatmap.get_yticklabels(), rotation=360)
    g.cax.set_position([1.02, 0.3, 0.03, 0.4])  # [left, bottom, width, height]
    g.cax.set_ylabel('Correlation Coefficient', rotation=270, labelpad=15)  # Rotate label
    g.fig.suptitle(title, y=0.9, fontsize=12)

    
def get_most_important_economic_factors_list(economic_factors_df,
                                             cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5):
        
        #plot_explained_variance_(reporting_year_period, reporting_frequency)
        #plot_explained_variance_(economic_factors_df)
        loadings_matrix_df, explained_variance = setting_PCA_for_economic_factors(economic_factors_df)
        #print('\nloadings_matrix_df\n')
        #display(loadings_matrix_df)
        num_components = get_num_components(explained_variance,cumulative_variance_treshold)
        top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        #print('\ntop_components_df\n')
        #display(top_components_df)
        #print('\ntop_indicators_df\n')
        top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        #display(top_indicators_df)
        most_important_economic_factors_list = selecting_importent_economic_factors_treshold_method_PCA(top_indicators_df,
                                                                                 threshold_for_highest_loadings)
        
        return most_important_economic_factors_list
    
    
def plotting_most_important_economic_factors_list(economic_factors_df,
                                             cumulative_variance_treshold = 1, threshold_for_highest_loadings = 0.5):
        
        #plot_explained_variance_(reporting_year_period, reporting_frequency)
        plot_explained_variance_(economic_factors_df)
        loadings_matrix_df, explained_variance = setting_PCA_for_economic_factors(economic_factors_df)
        print('\nloadings_matrix_df\n')
        display(loadings_matrix_df)
        num_components = get_num_components(explained_variance,cumulative_variance_treshold)
        top_components_df = select_top_components_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        print('\ntop_components_df\n')
        display(top_components_df)
        print('\ntop_indicators_df\n')
        top_indicators_df = select_top_indicators_df(loadings_matrix_df, num_components, threshold_for_highest_loadings)
        display(top_indicators_df)
        most_important_economic_factors_list = selecting_importent_economic_factors_treshold_method_PCA(top_indicators_df,
                                                                                 threshold_for_highest_loadings)
        
    

def get_most_important_economic_factors_df(economic_factors_df, most_important_economic_factors_list):
        return economic_factors_df[most_important_economic_factors_list]
        

def get_most_important_economic_factors_matrix(most_important_economic_factors_df):
    return  generate_correlation_matrix(most_important_economic_factors_df)   

def plotting_most_important_economic_factors_corr_clustermap(most_important_economic_factors_matrix):
    
        #PCA couple with covarience matrice to select most important factors
       
        plotting_corr_matrix(most_important_economic_factors_matrix,'Most Important Economic Factors Correlation Matrix Cluster Map using PCA')
        

#----------------------------------------------------------------------Main Data Setting-------------------------------------------
reporting_year_period = start_date(365*5)
reporting_frequency = 'Quarter'

cumulative_variance_treshold = 1.0
threshold_for_highest_loadings = 0.5
correlation_coefficient_treshold = 0.3

#Economic Factors Data Frames
goc_bonds_average_df = goc_bonds_average(reporting_year_period, reporting_frequency)
goc_benchmark_bonds_yield_df = goc_benchmark_bonds_yield(reporting_year_period, reporting_frequency) 
Treasury_bills_df = Treasury_bills(reporting_year_period, reporting_frequency)
other_economic_factors_df = other_economic_factors(reporting_year_period, reporting_frequency)
trade_balance_rate_df = get_trade_balance_rate(reporting_year_period, reporting_frequency)
economic_factors_df = get_economic_factors_df(reporting_year_period, reporting_frequency)

#All the economic factors correlation matrice    
economic_factors_matrix =  generate_correlation_matrix(economic_factors_df)   

#Principal Components Analysis(PCA) to select  Most Important Economic Factors
most_important_economic_factors_list = get_most_important_economic_factors_list(economic_factors_df,  cumulative_variance_treshold, 
                                                                                threshold_for_highest_loadings)
most_important_economic_factors_df = get_most_important_economic_factors_df(economic_factors_df, most_important_economic_factors_list)
most_important_economic_factors_matrix = get_most_important_economic_factors_matrix(most_important_economic_factors_df)

#-------------------------------------------------Data Visualization------------------------------------------------------------------
def print_economic_factors_data_table():
    print('\n                             **********************************************************\n'+
             '                              All the Economic Factors Data Tables\n'+
              '                         *********************************************************\n')

    display(economic_factors_df)
    get_economic_factors_barplotting(goc_bonds_average_df, goc_benchmark_bonds_yield_df,Treasury_bills_df, other_economic_factors_df ) 


def print_economic_factors_data_corr_matrix():
    print('\n                             **********************************************************\n'+
             '                              All the Economic Factors Correlation Matrix\n'+
              '                         *********************************************************\n')

    display(economic_factors_matrix)
    plotting_corr_matrix(economic_factors_matrix, 'All the economic factors')
    #plotting_selected_assets_corr_mat_clustermap(economic_factors_matrix, 'All the economic factors')

def print_most_important_economic_factors():

    print('\n                        *****************************************************************************************\n'+
             '                               Principal Components Analysis(PCA) to select  Most Important Economic Factors \n'+
              '                          ****************************************************************************************\n')

    print('Principal Components Analysis(PCA) to select  Most Important Economic Factors \n')
    plotting_most_important_economic_factors_list(economic_factors_df, cumulative_variance_treshold, threshold_for_highest_loadings)
    print('\n most_important_economic_factors_df\n')    
    display(most_important_economic_factors_df)
    print('\n most_important_economic_factors_matrix\n')
    display(most_important_economic_factors_matrix)
    plotting_corr_matrix(most_important_economic_factors_matrix, 'Most Important Economic Factors correlation Matrix - PCA Method')
    #plotting_selected_assets_corr_mat_clustermap(most_important_economic_factors_matrix, 'Most Important Economic Factors correlation Matrix - PCA Method', 
    #                                             dendrogram = True)
[*********************100%%**********************]  3 of 3 completed
[*********************100%%**********************]  3 of 3 completed

Data Visualization¶

In [62]:
print_economic_factors_data_table()
                             **********************************************************
                              All the Economic Factors Data Tables
                         *********************************************************

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 5-10 year GOC Marketable Bonds Average Yield: 3-5 year GOC Marketable Bonds Average Yield: over 10 years GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 5 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: 10 years GOC benchmark bond yields: long term ... Treasury bills: 2 month Treasury bills: 3 month Treasury bills: 6 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
Quarter_Year
2020Q1 1.2 1.1 1.1 1.3 1.1 1.1 1.1 1.1 1.1 1.4 ... 1.3 1.2 1.2 1.2 2.0 4.0 1.8 0.2 4.6 -2.1
2020Q2 0.3 0.5 0.4 1.0 0.3 0.3 0.4 0.4 0.5 1.1 ... 0.2 0.2 0.3 0.3 1.6 3.9 1.2 0.1 7.8 -10.6
2020Q3 0.2 0.5 0.3 1.0 0.3 0.3 0.4 0.4 0.6 1.1 ... 0.2 0.1 0.2 0.2 1.4 3.6 1.1 0.7 5.9 8.9
2020Q4 0.2 0.6 0.4 1.1 0.2 0.3 0.4 0.5 0.7 1.2 ... 0.1 0.1 0.1 0.2 1.7 3.4 1.0 0.5 5.2 2.1
2021Q1 0.2 1.0 0.5 1.7 0.2 0.3 0.7 0.9 1.2 1.8 ... 0.1 0.1 0.1 0.1 1.7 3.3 1.1 1.2 5.5 1.2
2021Q2 0.3 1.3 0.8 1.9 0.4 0.5 0.9 1.2 1.5 2.0 ... 0.1 0.1 0.2 0.2 2.4 3.3 1.2 1.3 5.1 -0.1
2021Q3 0.4 1.2 0.8 1.8 0.5 0.6 0.9 1.1 1.3 1.8 ... 0.2 0.2 0.2 0.3 2.9 3.2 1.2 0.5 4.6 1.6
2021Q4 1.0 1.5 1.3 1.9 1.0 1.1 1.4 1.5 1.6 1.9 ... 0.1 0.1 0.3 0.7 3.1 3.4 1.3 0.6 4.1 1.6
2022Q1 1.6 2.0 1.9 2.3 1.7 1.8 2.0 2.0 2.1 2.2 ... 0.3 0.4 0.9 1.4 4.0 3.6 1.6 1.1 4.3 0.8
2022Q2 2.7 2.9 2.8 3.0 2.7 2.8 2.8 2.8 2.9 2.9 ... 1.4 1.6 2.1 2.6 5.3 4.6 2.5 0.3 3.7 1.1
2022Q3 3.5 3.0 3.2 3.0 3.5 3.4 3.1 3.0 3.0 2.9 ... 2.9 3.1 3.4 3.7 5.8 5.6 3.2 0.0 3.6 0.5
2022Q4 3.9 3.2 3.5 3.3 3.9 3.7 3.3 3.1 3.2 3.2 ... 3.9 4.1 4.2 4.4 6.0 5.8 3.7 -0.1 3.5 -0.0
2023Q1 3.9 3.0 3.3 3.1 3.8 3.6 3.2 3.0 3.0 3.1 ... 4.4 4.4 4.4 4.4 5.9 5.8 3.9 -0.2 3.8 0.6
2023Q2 4.1 3.1 3.4 3.1 4.1 3.8 3.3 3.1 3.1 3.1 ... 4.6 4.6 4.7 4.7 5.3 5.8 4.0 0.0 3.7 0.2
2023Q3 4.8 3.8 4.1 3.6 4.8 4.5 4.0 3.8 3.7 3.5 ... 5.0 5.0 5.1 5.2 4.6 6.1 4.3 -0.1 3.7 -0.1
2023Q4 4.3 3.6 3.7 3.4 4.3 4.1 3.7 3.6 3.6 3.4 ... 5.0 5.0 5.0 4.8 4.0 6.4 4.3 -0.1 3.6 0.1
2024Q1 4.2 3.4 3.6 3.4 4.1 3.9 3.5 3.4 3.4 3.3 ... 5.0 5.0 4.9 4.8 3.1 6.2 4.1 0.0 4.0 0.5
2024Q2 4.3 3.7 3.8 3.6 4.2 4.1 3.7 3.7 3.7 3.6 ... 4.8 4.8 4.8 4.6 2.5 6.1 4.1 0.2 4.0 0.4

18 rows × 21 columns

In [63]:
print_economic_factors_data_corr_matrix()
                             **********************************************************
                              All the Economic Factors Correlation Matrix
                         *********************************************************

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 5-10 year GOC Marketable Bonds Average Yield: 3-5 year GOC Marketable Bonds Average Yield: over 10 years GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 5 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: 10 years GOC benchmark bond yields: long term ... Treasury bills: 2 month Treasury bills: 3 month Treasury bills: 6 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
GOC Marketable Bonds Average Yield: 1-3 year 1.000000 0.976199 0.993209 0.951231 0.999462 0.998315 0.989338 0.977915 0.967320 0.952575 ... 0.966132 0.973636 0.987148 0.996983 0.725500 0.972380 0.989560 -0.744187 -0.746375 -0.031706
GOC Marketable Bonds Average Yield: 5-10 year 0.976199 1.000000 0.992777 0.992448 0.978538 0.985369 0.995928 0.999459 0.998761 0.991043 ... 0.913679 0.925168 0.945192 0.962425 0.739110 0.920224 0.952833 -0.625150 -0.802838 0.009622
GOC Marketable Bonds Average Yield: 3-5 year 0.993209 0.992777 1.000000 0.975615 0.994331 0.997760 0.998818 0.993509 0.987007 0.973534 ... 0.936388 0.947076 0.966219 0.983328 0.756007 0.946726 0.971334 -0.693431 -0.785146 -0.017584
GOC Marketable Bonds Average Yield: over 10 years 0.951231 0.992448 0.975615 1.000000 0.954739 0.963174 0.980846 0.989952 0.995923 0.998236 ... 0.885694 0.898744 0.918627 0.937023 0.746162 0.886569 0.927391 -0.561822 -0.805263 0.029892
GOC benchmark bond yields: 2 year 0.999462 0.978538 0.994331 0.954739 1.000000 0.998947 0.990991 0.979915 0.970160 0.955397 ... 0.962590 0.970533 0.985163 0.996038 0.734083 0.969309 0.987723 -0.735073 -0.750504 -0.021169
GOC benchmark bond yields: 3 year 0.998315 0.985369 0.997760 0.963174 0.998947 1.000000 0.995408 0.986585 0.977878 0.962796 ... 0.953059 0.962051 0.978696 0.992004 0.740256 0.962145 0.982265 -0.720636 -0.765741 -0.014608
GOC benchmark bond yields: 5 year 0.989338 0.995928 0.998818 0.980846 0.990991 0.995408 1.000000 0.996909 0.991605 0.978755 ... 0.929439 0.940004 0.960019 0.977644 0.751887 0.937894 0.966160 -0.668349 -0.793680 -0.001892
GOC benchmark bond yields: 7 year 0.977915 0.999459 0.993509 0.989952 0.979915 0.986585 0.996909 1.000000 0.997936 0.988589 ... 0.915759 0.926639 0.946566 0.963724 0.735470 0.920864 0.954083 -0.626895 -0.806113 0.007730
GOC benchmark bond yields: 10 years 0.967320 0.998761 0.987007 0.995923 0.970160 0.977878 0.991605 0.997936 1.000000 0.994981 ... 0.904050 0.915834 0.935894 0.953085 0.736259 0.907991 0.943930 -0.591373 -0.811841 0.030765
GOC benchmark bond yields: long term 0.952575 0.991043 0.973534 0.998236 0.955397 0.962796 0.978755 0.988589 0.994981 1.000000 ... 0.898326 0.909881 0.926861 0.940557 0.726525 0.894927 0.935540 -0.562839 -0.794732 0.025609
Treasury bills: 1 month 0.958881 0.903818 0.927301 0.876058 0.955022 0.944663 0.919827 0.906120 0.894311 0.889460 ... 0.999360 0.997743 0.990470 0.974308 0.569049 0.978951 0.987231 -0.765780 -0.633458 -0.047291
Treasury bills: 2 month 0.966132 0.913679 0.936388 0.885694 0.962590 0.953059 0.929439 0.915759 0.904050 0.898326 ... 1.000000 0.999162 0.994025 0.980071 0.587856 0.982410 0.991599 -0.768520 -0.644867 -0.045324
Treasury bills: 3 month 0.973636 0.925168 0.947076 0.898744 0.970533 0.962051 0.940004 0.926639 0.915834 0.909881 ... 0.999162 1.000000 0.997109 0.986143 0.612922 0.986454 0.995128 -0.770248 -0.658274 -0.048666
Treasury bills: 6 month 0.987148 0.945192 0.966219 0.918627 0.985163 0.978696 0.960019 0.946566 0.935894 0.926861 ... 0.994025 0.997109 1.000000 0.995380 0.654033 0.989194 0.999187 -0.764929 -0.682239 -0.045914
Treasury bills: 1 year 0.996983 0.962425 0.983328 0.937023 0.996038 0.992004 0.977644 0.963724 0.953085 0.940557 ... 0.980071 0.986143 0.995380 1.000000 0.709570 0.982448 0.995893 -0.761639 -0.720879 -0.036478
CPI Inflaction Rate 0.725500 0.739110 0.756007 0.746162 0.734083 0.740256 0.751887 0.735470 0.736259 0.726525 ... 0.587856 0.612922 0.654033 0.709570 1.000000 0.629513 0.666817 -0.541372 -0.751794 0.017866
Morgage Rate 0.972380 0.920224 0.946726 0.886569 0.969309 0.962145 0.937894 0.920864 0.907991 0.894927 ... 0.982410 0.986454 0.989194 0.982448 0.629513 1.000000 0.987620 -0.805028 -0.617461 -0.090534
Prime Rate 0.989560 0.952833 0.971334 0.927391 0.987723 0.982265 0.966160 0.954083 0.943930 0.935540 ... 0.991599 0.995128 0.999187 0.995893 0.666817 0.987620 1.000000 -0.762774 -0.694479 -0.048387
House Price Index(house and land) -0.744187 -0.625150 -0.693431 -0.561822 -0.735073 -0.720636 -0.668349 -0.626895 -0.591373 -0.562839 ... -0.768520 -0.770248 -0.764929 -0.761639 -0.541372 -0.805028 -0.762774 1.000000 0.388159 0.275270
Unemployment rate -0.746375 -0.802838 -0.785146 -0.805263 -0.750504 -0.765741 -0.793680 -0.806113 -0.811841 -0.794732 ... -0.644867 -0.658274 -0.682239 -0.720879 -0.751794 -0.617461 -0.694479 0.388159 1.000000 -0.353492
Real GDP growth Seasonal adjustment -0.031706 0.009622 -0.017584 0.029892 -0.021169 -0.014608 -0.001892 0.007730 0.030765 0.025609 ... -0.045324 -0.048666 -0.045914 -0.036478 0.017866 -0.090534 -0.048387 0.275270 -0.353492 1.000000

21 rows × 21 columns

In [64]:
print_most_important_economic_factors()
                        *****************************************************************************************
                               Principal Components Analysis(PCA) to select  Most Important Economic Factors 
                          ****************************************************************************************

Principal Components Analysis(PCA) to select  Most Important Economic Factors 

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18
0 0.85935 0.068931 0.03383 0.024056 0.007083 0.004152 0.001816 0.000543 0.000109 0.000046 0.000031 0.000021 0.000012 0.000007 0.000005 0.000004 0.000002 4.703571e-34
loadings_matrix_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18
GOC Marketable Bonds Average Yield: 1-3 year 0.234760 0.026131 0.016201 -0.012893 -0.024366 0.061716 0.291109 -0.139295 0.133451 0.157802 0.515279 0.182155 0.241015 -0.101838 0.360692 -0.184581 -0.003957 0.170916
GOC Marketable Bonds Average Yield: 5-10 year 0.231555 -0.080234 -0.095232 0.153528 0.031926 0.220653 -0.045993 0.043579 -0.190157 0.094682 -0.343397 -0.213668 -0.071449 0.159755 0.474708 -0.266721 0.230909 -0.350206
GOC Marketable Bonds Average Yield: 3-5 year 0.234132 -0.029851 -0.072510 0.042401 -0.000326 0.170944 0.217199 -0.078918 0.286689 -0.068549 0.178631 -0.298389 -0.702733 0.011258 0.143179 0.156651 0.048250 0.146997
GOC Marketable Bonds Average Yield: over 10 years 0.227174 -0.122381 -0.136092 0.217986 -0.057406 0.134132 -0.429816 -0.114250 0.444101 -0.171831 -0.208001 0.320040 -0.023063 0.076492 0.068157 -0.254903 -0.325177 0.221735
GOC benchmark bond yields: 2 year 0.234779 0.013805 0.008394 -0.010948 -0.061867 0.068719 0.297248 -0.173324 0.017306 -0.047074 -0.160140 -0.375752 0.524676 0.397393 0.059814 0.058629 -0.202109 0.198754
GOC benchmark bond yields: 3 year 0.234758 -0.005337 -0.016723 0.008134 -0.029301 0.139984 0.287062 -0.090465 0.039591 0.137423 -0.004180 -0.218928 -0.028056 -0.277546 -0.505661 -0.334065 0.068860 -0.036051
GOC benchmark bond yields: 5 year 0.233485 -0.051348 -0.076537 0.077274 0.002426 0.185022 0.216081 -0.095769 -0.199210 -0.309908 0.116107 0.498475 -0.034998 0.289604 -0.361245 -0.134422 0.216742 -0.104021
GOC benchmark bond yields: 7 year 0.231707 -0.078183 -0.090904 0.152294 0.065766 0.199976 0.051898 0.010020 -0.347636 -0.330405 -0.001623 0.058658 0.114303 -0.590334 0.179727 0.203702 -0.278951 -0.212126
GOC benchmark bond yields: 10 years 0.230046 -0.107253 -0.098440 0.189327 0.017254 0.166038 -0.119412 0.138684 -0.133025 0.001400 -0.256611 -0.100427 0.049083 0.024542 -0.254798 0.375737 0.279357 0.379159
GOC benchmark bond yields: long term 0.227639 -0.108980 -0.102382 0.240735 -0.052869 0.050951 -0.511097 -0.023798 -0.019783 0.353407 0.502917 -0.195265 0.169617 0.019484 -0.164477 0.134039 0.065868 -0.173841
Treasury bills: 1 month 0.225278 0.128380 0.259529 0.019614 0.025262 -0.352326 -0.132600 -0.161266 0.086808 -0.541618 0.152998 -0.267793 -0.031216 0.162620 -0.103672 0.115781 -0.119419 -0.296309
Treasury bills: 2 month 0.226989 0.119869 0.237662 0.011307 0.007964 -0.309849 -0.107582 -0.145893 -0.201502 -0.078164 -0.020385 0.101861 -0.005137 -0.203259 0.208998 0.083060 0.322462 0.497798
Treasury bills: 3 month 0.228918 0.111167 0.202732 0.004675 -0.015006 -0.274253 -0.119133 -0.032772 0.104932 0.042636 -0.161700 -0.009663 0.049030 -0.144779 -0.006277 -0.375099 0.378795 -0.164712
Treasury bills: 6 month 0.231878 0.088394 0.144440 -0.000347 -0.064215 -0.175812 0.048952 -0.000216 -0.009291 0.301415 -0.265119 -0.029288 -0.099210 -0.238959 -0.209504 -0.082658 -0.459009 0.058260
Treasury bills: 1 year 0.234082 0.053455 0.065061 -0.036268 -0.075239 -0.075118 0.192900 -0.096072 0.369251 0.282200 -0.215109 0.348370 0.066918 0.001790 0.046062 0.547883 0.120218 -0.373485
CPI Inflaction Rate 0.173762 -0.180793 -0.575582 -0.504796 -0.493601 -0.289764 -0.058610 0.010737 -0.074097 -0.086223 -0.010197 -0.031054 -0.013493 -0.045574 0.023215 -0.007326 0.039455 -0.009281
Morgage Rate 0.227671 0.150522 0.160041 -0.041466 -0.149447 0.043348 0.053773 0.891069 0.145476 -0.116283 0.110641 0.020574 0.072023 0.044917 0.027882 -0.039727 -0.025172 0.026488
Prime Rate 0.232769 0.079864 0.118729 0.003283 -0.045882 -0.147697 -0.017152 0.019519 -0.508541 0.292196 0.064961 0.185499 -0.310753 0.371407 0.050503 -0.003452 -0.308323 0.006737
House Price Index(house and land) -0.171404 -0.381353 -0.085620 0.663168 -0.268190 -0.450428 0.298552 0.103229 0.016639 -0.009538 0.011201 -0.002338 -0.013328 0.005792 0.027148 -0.033986 0.003009 -0.002548
Unemployment rate -0.181182 0.436201 0.172573 0.184132 -0.757262 0.304913 -0.061990 -0.167199 -0.067842 -0.048952 -0.015296 -0.015750 -0.028583 -0.042345 0.013702 0.025426 0.041119 -0.014845
Real GDP growth Seasonal adjustment -0.003512 -0.702397 0.578009 -0.267773 -0.242523 0.190165 -0.055179 -0.037586 -0.019916 -0.010577 0.004074 -0.000240 -0.015595 -0.011826 0.009457 0.008515 0.003751 -0.009416
top_components_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17
GOC Marketable Bonds Average Yield: 1-3 year 0.234760 0.026131 0.016201 -0.012893 -0.024366 0.061716 0.291109 -0.139295 0.133451 0.157802 0.515279 0.182155 0.241015 -0.101838 0.360692 -0.184581 -0.003957
GOC Marketable Bonds Average Yield: 5-10 year 0.231555 -0.080234 -0.095232 0.153528 0.031926 0.220653 -0.045993 0.043579 -0.190157 0.094682 -0.343397 -0.213668 -0.071449 0.159755 0.474708 -0.266721 0.230909
GOC Marketable Bonds Average Yield: 3-5 year 0.234132 -0.029851 -0.072510 0.042401 -0.000326 0.170944 0.217199 -0.078918 0.286689 -0.068549 0.178631 -0.298389 -0.702733 0.011258 0.143179 0.156651 0.048250
GOC Marketable Bonds Average Yield: over 10 years 0.227174 -0.122381 -0.136092 0.217986 -0.057406 0.134132 -0.429816 -0.114250 0.444101 -0.171831 -0.208001 0.320040 -0.023063 0.076492 0.068157 -0.254903 -0.325177
GOC benchmark bond yields: 2 year 0.234779 0.013805 0.008394 -0.010948 -0.061867 0.068719 0.297248 -0.173324 0.017306 -0.047074 -0.160140 -0.375752 0.524676 0.397393 0.059814 0.058629 -0.202109
GOC benchmark bond yields: 3 year 0.234758 -0.005337 -0.016723 0.008134 -0.029301 0.139984 0.287062 -0.090465 0.039591 0.137423 -0.004180 -0.218928 -0.028056 -0.277546 -0.505661 -0.334065 0.068860
GOC benchmark bond yields: 5 year 0.233485 -0.051348 -0.076537 0.077274 0.002426 0.185022 0.216081 -0.095769 -0.199210 -0.309908 0.116107 0.498475 -0.034998 0.289604 -0.361245 -0.134422 0.216742
GOC benchmark bond yields: 7 year 0.231707 -0.078183 -0.090904 0.152294 0.065766 0.199976 0.051898 0.010020 -0.347636 -0.330405 -0.001623 0.058658 0.114303 -0.590334 0.179727 0.203702 -0.278951
GOC benchmark bond yields: 10 years 0.230046 -0.107253 -0.098440 0.189327 0.017254 0.166038 -0.119412 0.138684 -0.133025 0.001400 -0.256611 -0.100427 0.049083 0.024542 -0.254798 0.375737 0.279357
GOC benchmark bond yields: long term 0.227639 -0.108980 -0.102382 0.240735 -0.052869 0.050951 -0.511097 -0.023798 -0.019783 0.353407 0.502917 -0.195265 0.169617 0.019484 -0.164477 0.134039 0.065868
Treasury bills: 1 month 0.225278 0.128380 0.259529 0.019614 0.025262 -0.352326 -0.132600 -0.161266 0.086808 -0.541618 0.152998 -0.267793 -0.031216 0.162620 -0.103672 0.115781 -0.119419
Treasury bills: 2 month 0.226989 0.119869 0.237662 0.011307 0.007964 -0.309849 -0.107582 -0.145893 -0.201502 -0.078164 -0.020385 0.101861 -0.005137 -0.203259 0.208998 0.083060 0.322462
Treasury bills: 3 month 0.228918 0.111167 0.202732 0.004675 -0.015006 -0.274253 -0.119133 -0.032772 0.104932 0.042636 -0.161700 -0.009663 0.049030 -0.144779 -0.006277 -0.375099 0.378795
Treasury bills: 6 month 0.231878 0.088394 0.144440 -0.000347 -0.064215 -0.175812 0.048952 -0.000216 -0.009291 0.301415 -0.265119 -0.029288 -0.099210 -0.238959 -0.209504 -0.082658 -0.459009
Treasury bills: 1 year 0.234082 0.053455 0.065061 -0.036268 -0.075239 -0.075118 0.192900 -0.096072 0.369251 0.282200 -0.215109 0.348370 0.066918 0.001790 0.046062 0.547883 0.120218
CPI Inflaction Rate 0.173762 -0.180793 -0.575582 -0.504796 -0.493601 -0.289764 -0.058610 0.010737 -0.074097 -0.086223 -0.010197 -0.031054 -0.013493 -0.045574 0.023215 -0.007326 0.039455
Morgage Rate 0.227671 0.150522 0.160041 -0.041466 -0.149447 0.043348 0.053773 0.891069 0.145476 -0.116283 0.110641 0.020574 0.072023 0.044917 0.027882 -0.039727 -0.025172
Prime Rate 0.232769 0.079864 0.118729 0.003283 -0.045882 -0.147697 -0.017152 0.019519 -0.508541 0.292196 0.064961 0.185499 -0.310753 0.371407 0.050503 -0.003452 -0.308323
House Price Index(house and land) -0.171404 -0.381353 -0.085620 0.663168 -0.268190 -0.450428 0.298552 0.103229 0.016639 -0.009538 0.011201 -0.002338 -0.013328 0.005792 0.027148 -0.033986 0.003009
Unemployment rate -0.181182 0.436201 0.172573 0.184132 -0.757262 0.304913 -0.061990 -0.167199 -0.067842 -0.048952 -0.015296 -0.015750 -0.028583 -0.042345 0.013702 0.025426 0.041119
Real GDP growth Seasonal adjustment -0.003512 -0.702397 0.578009 -0.267773 -0.242523 0.190165 -0.055179 -0.037586 -0.019916 -0.010577 0.004074 -0.000240 -0.015595 -0.011826 0.009457 0.008515 0.003751
top_indicators_df

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17
GOC Marketable Bonds Average Yield: 1-3 year 0.234760 0.026131 0.016201 -0.012893 -0.024366 0.061716 0.291109 -0.139295 0.133451 0.157802 0.515279 0.182155 0.241015 -0.101838 0.360692 -0.184581 -0.003957
GOC Marketable Bonds Average Yield: 3-5 year 0.234132 -0.029851 -0.072510 0.042401 -0.000326 0.170944 0.217199 -0.078918 0.286689 -0.068549 0.178631 -0.298389 -0.702733 0.011258 0.143179 0.156651 0.048250
GOC benchmark bond yields: 2 year 0.234779 0.013805 0.008394 -0.010948 -0.061867 0.068719 0.297248 -0.173324 0.017306 -0.047074 -0.160140 -0.375752 0.524676 0.397393 0.059814 0.058629 -0.202109
GOC benchmark bond yields: 3 year 0.234758 -0.005337 -0.016723 0.008134 -0.029301 0.139984 0.287062 -0.090465 0.039591 0.137423 -0.004180 -0.218928 -0.028056 -0.277546 -0.505661 -0.334065 0.068860
GOC benchmark bond yields: 7 year 0.231707 -0.078183 -0.090904 0.152294 0.065766 0.199976 0.051898 0.010020 -0.347636 -0.330405 -0.001623 0.058658 0.114303 -0.590334 0.179727 0.203702 -0.278951
GOC benchmark bond yields: long term 0.227639 -0.108980 -0.102382 0.240735 -0.052869 0.050951 -0.511097 -0.023798 -0.019783 0.353407 0.502917 -0.195265 0.169617 0.019484 -0.164477 0.134039 0.065868
Treasury bills: 1 month 0.225278 0.128380 0.259529 0.019614 0.025262 -0.352326 -0.132600 -0.161266 0.086808 -0.541618 0.152998 -0.267793 -0.031216 0.162620 -0.103672 0.115781 -0.119419
Treasury bills: 1 year 0.234082 0.053455 0.065061 -0.036268 -0.075239 -0.075118 0.192900 -0.096072 0.369251 0.282200 -0.215109 0.348370 0.066918 0.001790 0.046062 0.547883 0.120218
CPI Inflaction Rate 0.173762 -0.180793 -0.575582 -0.504796 -0.493601 -0.289764 -0.058610 0.010737 -0.074097 -0.086223 -0.010197 -0.031054 -0.013493 -0.045574 0.023215 -0.007326 0.039455
Morgage Rate 0.227671 0.150522 0.160041 -0.041466 -0.149447 0.043348 0.053773 0.891069 0.145476 -0.116283 0.110641 0.020574 0.072023 0.044917 0.027882 -0.039727 -0.025172
Prime Rate 0.232769 0.079864 0.118729 0.003283 -0.045882 -0.147697 -0.017152 0.019519 -0.508541 0.292196 0.064961 0.185499 -0.310753 0.371407 0.050503 -0.003452 -0.308323
House Price Index(house and land) -0.171404 -0.381353 -0.085620 0.663168 -0.268190 -0.450428 0.298552 0.103229 0.016639 -0.009538 0.011201 -0.002338 -0.013328 0.005792 0.027148 -0.033986 0.003009
Unemployment rate -0.181182 0.436201 0.172573 0.184132 -0.757262 0.304913 -0.061990 -0.167199 -0.067842 -0.048952 -0.015296 -0.015750 -0.028583 -0.042345 0.013702 0.025426 0.041119
Real GDP growth Seasonal adjustment -0.003512 -0.702397 0.578009 -0.267773 -0.242523 0.190165 -0.055179 -0.037586 -0.019916 -0.010577 0.004074 -0.000240 -0.015595 -0.011826 0.009457 0.008515 0.003751
 most_important_economic_factors_df

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 3-5 year GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: long term Treasury bills: 1 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
Quarter_Year
2020Q1 1.2 1.1 1.1 1.1 1.1 1.4 1.3 1.2 2.0 4.0 1.8 0.2 4.6 -2.1
2020Q2 0.3 0.4 0.3 0.3 0.4 1.1 0.2 0.3 1.6 3.9 1.2 0.1 7.8 -10.6
2020Q3 0.2 0.3 0.3 0.3 0.4 1.1 0.2 0.2 1.4 3.6 1.1 0.7 5.9 8.9
2020Q4 0.2 0.4 0.2 0.3 0.5 1.2 0.1 0.2 1.7 3.4 1.0 0.5 5.2 2.1
2021Q1 0.2 0.5 0.2 0.3 0.9 1.8 0.1 0.1 1.7 3.3 1.1 1.2 5.5 1.2
2021Q2 0.3 0.8 0.4 0.5 1.2 2.0 0.1 0.2 2.4 3.3 1.2 1.3 5.1 -0.1
2021Q3 0.4 0.8 0.5 0.6 1.1 1.8 0.2 0.3 2.9 3.2 1.2 0.5 4.6 1.6
2021Q4 1.0 1.3 1.0 1.1 1.5 1.9 0.1 0.7 3.1 3.4 1.3 0.6 4.1 1.6
2022Q1 1.6 1.9 1.7 1.8 2.0 2.2 0.2 1.4 4.0 3.6 1.6 1.1 4.3 0.8
2022Q2 2.7 2.8 2.7 2.8 2.8 2.9 1.1 2.6 5.3 4.6 2.5 0.3 3.7 1.1
2022Q3 3.5 3.2 3.5 3.4 3.0 2.9 2.8 3.7 5.8 5.6 3.2 0.0 3.6 0.5
2022Q4 3.9 3.5 3.9 3.7 3.1 3.2 3.9 4.4 6.0 5.8 3.7 -0.1 3.5 -0.0
2023Q1 3.9 3.3 3.8 3.6 3.0 3.1 4.3 4.4 5.9 5.8 3.9 -0.2 3.8 0.6
2023Q2 4.1 3.4 4.1 3.8 3.1 3.1 4.5 4.7 5.3 5.8 4.0 0.0 3.7 0.2
2023Q3 4.8 4.1 4.8 4.5 3.8 3.5 4.9 5.2 4.6 6.1 4.3 -0.1 3.7 -0.1
2023Q4 4.3 3.7 4.3 4.1 3.6 3.4 4.9 4.8 4.0 6.4 4.3 -0.1 3.6 0.1
2024Q1 4.2 3.6 4.1 3.9 3.4 3.3 5.0 4.8 3.1 6.2 4.1 0.0 4.0 0.5
2024Q2 4.3 3.8 4.2 4.1 3.7 3.6 4.8 4.6 2.5 6.1 4.1 0.2 4.0 0.4
 most_important_economic_factors_matrix

GOC Marketable Bonds Average Yield: 1-3 year GOC Marketable Bonds Average Yield: 3-5 year GOC benchmark bond yields: 2 year GOC benchmark bond yields: 3 year GOC benchmark bond yields: 7 year GOC benchmark bond yields: long term Treasury bills: 1 month Treasury bills: 1 year CPI Inflaction Rate Morgage Rate Prime Rate House Price Index(house and land) Unemployment rate Real GDP growth Seasonal adjustment
GOC Marketable Bonds Average Yield: 1-3 year 1.000000 0.993209 0.999462 0.998315 0.977915 0.952575 0.958881 0.996983 0.725500 0.972380 0.989560 -0.744187 -0.746375 -0.031706
GOC Marketable Bonds Average Yield: 3-5 year 0.993209 1.000000 0.994331 0.997760 0.993509 0.973534 0.927301 0.983328 0.756007 0.946726 0.971334 -0.693431 -0.785146 -0.017584
GOC benchmark bond yields: 2 year 0.999462 0.994331 1.000000 0.998947 0.979915 0.955397 0.955022 0.996038 0.734083 0.969309 0.987723 -0.735073 -0.750504 -0.021169
GOC benchmark bond yields: 3 year 0.998315 0.997760 0.998947 1.000000 0.986585 0.962796 0.944663 0.992004 0.740256 0.962145 0.982265 -0.720636 -0.765741 -0.014608
GOC benchmark bond yields: 7 year 0.977915 0.993509 0.979915 0.986585 1.000000 0.988589 0.906120 0.963724 0.735470 0.920864 0.954083 -0.626895 -0.806113 0.007730
GOC benchmark bond yields: long term 0.952575 0.973534 0.955397 0.962796 0.988589 1.000000 0.889460 0.940557 0.726525 0.894927 0.935540 -0.562839 -0.794732 0.025609
Treasury bills: 1 month 0.958881 0.927301 0.955022 0.944663 0.906120 0.889460 1.000000 0.974308 0.569049 0.978951 0.987231 -0.765780 -0.633458 -0.047291
Treasury bills: 1 year 0.996983 0.983328 0.996038 0.992004 0.963724 0.940557 0.974308 1.000000 0.709570 0.982448 0.995893 -0.761639 -0.720879 -0.036478
CPI Inflaction Rate 0.725500 0.756007 0.734083 0.740256 0.735470 0.726525 0.569049 0.709570 1.000000 0.629513 0.666817 -0.541372 -0.751794 0.017866
Morgage Rate 0.972380 0.946726 0.969309 0.962145 0.920864 0.894927 0.978951 0.982448 0.629513 1.000000 0.987620 -0.805028 -0.617461 -0.090534
Prime Rate 0.989560 0.971334 0.987723 0.982265 0.954083 0.935540 0.987231 0.995893 0.666817 0.987620 1.000000 -0.762774 -0.694479 -0.048387
House Price Index(house and land) -0.744187 -0.693431 -0.735073 -0.720636 -0.626895 -0.562839 -0.765780 -0.761639 -0.541372 -0.805028 -0.762774 1.000000 0.388159 0.275270
Unemployment rate -0.746375 -0.785146 -0.750504 -0.765741 -0.806113 -0.794732 -0.633458 -0.720879 -0.751794 -0.617461 -0.694479 0.388159 1.000000 -0.353492
Real GDP growth Seasonal adjustment -0.031706 -0.017584 -0.021169 -0.014608 0.007730 0.025609 -0.047291 -0.036478 0.017866 -0.090534 -0.048387 0.275270 -0.353492 1.000000
Scenario Analysis¶

Macroeconomics KPI Best Case Scenario - Worst Case Scenario and Normal Case Scenario

In [65]:
#under implementation
Recalculate Portfolio Key Performence metricstrics¶

recalculate Expected return, Standard deviation (risk), and Value-at-Risk (VaR).

In [66]:
#under implementation
Visualize the Stress Test Results¶

Visualizing the impact of the stress scenario on the portfolio can help in understanding the potential risks.

Decision Trees in Portfolio Stress Testing¶

In this section, we will use Decision Tree to model how different scenarios might cascade through the portfolio, affecting asset values, returns, and overall portfolio performance.

In [67]:
#under implementation
Interpret the Results¶
  • Expected Return under Stress: Indicates how much the portfolio's return is expected to decrease under the stress scenario.
  • Portfolio Risk under Stress: Shows how much the risk (volatility) increases under the stress scenario.
  • VaR under Stress: Quantifies the potential loss in the portfolio's value at a specified confidence level under stressed conditions.
In [68]:
#under implementation
In [ ]:
 
In [ ]: